“Vision + X” for Y
>> Left: RGB+ thermal (depth estimation)
>> Right: RGB + LiDAR (object detection)
>> Bottom: RGB + drone motion (autoencoder)
Modalities (=X) that we are interested in :
•
Language
•
Thermal camera
•
Event camera
•
LiDAR
•
Motion
•
Audio
Applications (=Y) that we are intersted in :
•
Scene understanding
◦
Object detection, 3D perception, segmetation, etc.
•
View synthesis, viewpoint manipulation, 3D rendering
•
Representation/transfer Learning