AAPL: Adding Attributes to Prompt Learning for Vision-Language Models
Gahyeon Kim*, Sohee Kim*, Seokju Lee (* equal contribution)
CVPR Workshop on Prompting in Vision (CVPRw), Jun 2024
We present a novel approach to enhance zero-shot learning in vision-language models by addressing the bias towards seen classes in traditional prompt learning methods like CoOp and CoCoOp. Our method, "Adding Attributes to Prompt Learning" (AAPL), uses adversarial token embedding to separate low-level visual features from high-level class information, improving generalization to unseen classes. Experiments across 11 datasets show that AAPL outperforms existing methods in few-shot, zero-shot, cross-dataset, and domain generalization tasks.
MoDA: Leveraging Motion Priors from Videos for Advancing Unsupervised Domain Adaptation in Semantic Segmentation
Fei Pan*, Xu Yin*, Seokju Lee, Axi Niu, Sungeui Yoon, In So Kweon (* equal contribution)
CVPR Workshop on Learning with Limited Labelled Data for Image and Video Understanding (CVPRw), Jun 2024 Best Paper Award
We present a novel unsupervised domain adaptation (UDA) framework for semantic segmentation called MoDA, designed for scenarios with unlabeled video frames. MoDA leverages self-supervised object motion cues to improve cross-domain alignment by using an object discovery module to segment moving objects and a semantic mining module to refine pseudo labels. These refined labels are used in a self-training loop to bridge the domain gap. Experimental results show that MoDA effectively utilizes object motion for domain alignment and can complement existing state-of-the-art UDA approaches.