Team
home
Professor
home
📷

Vision & Robotics

Topics
vision
machine learning
robotics
multimodal learning

I. Overview

Visual perception, a primary source of information for a human, is the ability to interpret the surrounding environment through the visual system.”
Characteristics of human intelligence
Hierarchical & sparse neurons
Multisensory & multimodal data processsing
Independent, but flexible knowledge interaction
We focus on building human-like visual systems by collaboratively and interactively leveraging semantic and geometric knowledge, and understanding fundamental machine learning techniques such as self-supervised learning, multimodal learning, one-shot learning, domain adaptation, meta learning, active learning, etc.

II. Semantic perception

Object detection and recognition
Semantic segmentation, instance segmentation, video segmentation
>> Comparison of semantic segmentation, classification and localization, object detection and instance segmentation (Li, Johnson and Yeung, 2017)
>> Object detection and segmentation in driving scenes (from AAAI’21 and ICCV’21)
>> Lane and road marking detection in dynamic scenes (from ICCV’17)
>> Semantic segmentation through unsupervised domain adaptation (from CVPR’20)
Related publications
1.
S. Lee, J. Kim, J.S. Yoon, S. Shin, O. Bailo, N. Kim, T.H. Lee, H.S. Hong, S.H. Han, and I.S. Kweon, ”VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition", IEEE International Conference on Computer Vision (ICCV) 2017. Detection Dataset Multi-task learning
2.
J. Kim, S. Lee, T.H. Oh, and I.S. Kweon, “Co-domain Embedding using Deep Quadruplet Networks for Unseen Traffic Sign Recognition”, AAAI Conference on Artificial Intelligence (AAAI) 2018. Classification Few-shot learning
3.
J. Kim, T.H. Oh, S. Lee, F. Pan, and I.S. Kweon, “Variational Prototyping-Encoder: One-Shot Learning with Prototypical Images”, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019. Classification One-shot learning
4.
F. Pan, I. Shin, F. Rameau, S. Lee, and I.S. Kweon, “Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision”, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020. Semantic segmentation Domain adaptation
5.
J.S. Yoon, F. Rameau, J. Kim, S. Lee, S. Shin, and I.S. Kweon, “Pixel-Level Matching for Video Object Segmentation using Convolutional Neural Networks”, IEEE International Conference on Computer Vision (ICCV) 2017. Video object segmentation
6.
O. Bailo, S. Lee, F. Rameau, J.S. Yoon, and I.S. Kweon, “Robust Road Marking Detection and Recognition Using Density-Based Grouping and Machine Learning Techniques”, IEEE Winter Conference on Applications of Computer Vision (WACV) 2017. Detection

III. 3D visual perception

Monocular depth and motion estimation
Correspondence problem: stereo matching, optical flow
3D reconstrcution, visual odometry, simultaneous localization and mapping (SLAM)
Neural radiance fields (NeRF)
>> Monocular depth estimation + Unified visual odometry (from AAAI’21)
>> Real-time stereo matching & 3D point cloud visualization (from IROS’21)
>> Depth estimation via RGB-Thermal camera fusion (from RA-L’22/ICRA’22)
Related publications
1.
S. Lee, F. Rameau, F. Pan, and I.S. Kweon, ”Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation", IEEE International Conference on Computer Vision (ICCV) 2021. monocular depth self-supervised learning
2.
S. Lee, S. Im, S. Lin, and I.S. Kweon, “Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency”, AAAI Conference on Artificial Intelligence (AAAI) 2021. monocular depth self-supervised learning
3.
U. Shin, K. Lee, S. Lee, and I.S. Kweon, “Self-supervised Depth and Ego-motion Estimation for Monocular Thermal Video Using Multi-spectral Consistency Loss”, IEEE Robotics and Automation Letters (RA-L) 2022. monocular depth sensor fusion self-supervised learning
4.
S. Lee, S. Im, S. Lin, and I.S. Kweon, “Learning Residual Flow as Dynamic Motion from Stereo Videos”, IEEE International Conference on Intelligent Robots and Systems (IROS) 2019. optical flow self-supervised learning
5.
A. Bangunharcana, J.W. Cho, S. Lee, I.S. Kweon, K.S. Kim, and S. Kim, ”Correlate-and-Excite: Real-Time Stereo Matching via Guided Cost Volume Excitation”, IEEE International Conference on Intelligent Robots and Systems (IROS) 2021. stereo matching

IV. Multimodal learning

“Vision + X” for Y

>> Left: RGB+ thermal (depth estimation) >> Right: RGB + LiDAR (object detection) >> Bottom: RGB + drone motion (autoencoder)

Modalities (=X) that we are interested in :

Language
Thermal camera
Event camera
LiDAR
Motion
Audio

Applications (=Y) that we are intersted in :

Scene understanding
Object detection, 3D perception, segmetation, etc.
View synthesis, viewpoint manipulation, 3D rendering
Representation/transfer Learning