Repository files navigation [스터디 ] 거꾸로 읽는 self-supervised-learning 시즌3: Visual Language Models
Type
Paper title
Affiliation
Date to be published at ArXiv
Speaker
Youtube
1 주차
VLM bechmark and metric
VLM 관련 벤치마크와 메트릭 소개
강재욱
youtube1 , youtube2
2 주차
Vision transformer
ViT:An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Google
2020 Oct
이인규
youtube
3 주차
Dual encoder
CLIP: Learning Transferable Visual Models From Natural Language Supervision
OpenAI
2021 Feb
김희은
youtube
4 주차
Image-text matching
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
MS
2020 Apr
신성호
youtube
5 주차
Image-text contrastive learning
ALBEF: Align before Fuse: Vision and LanguageRepresentation Learning with Momentum Distillation
Salesforce
2021 Jul
이유경
youtube
6 주차
Masked Image Modeling
BEiT: BERT Pre-Training of Image Transformers
MS
2021 Jun
박민지
7 주차
Masked VLM
Masked Vision and Language Modeling for Multi-modal Representation Learning
Amazon
2022 Aug
김강민
youtube
8 주차
Multimodal funsion by MoE
VLMO: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
MS
2021 Nov
백혜림
youtube
9 주차
Multimodal funsion by merged attention
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Google
2021 Aug
정윤성
youtube
10 주차
Multimodal funsion by co-attention
CoCa: Contrastive Captioners are Image-Text Foundation Models
Google
2022 May
김승우
youtube
11 주차
Few-shot learning in VLM
Flamingo: a visual language model for few-shot learning
DeepMind
2022 Apr
조성국
youtube
12 주차
Model scaling for VLM 1
GIT: A Generative Image-to-text Transformer for Vision and Language
MS
2022 May
김기범
youtube
13 주차
Model scaling for VLM 2
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Google
2022 Sep
이영수
14 주차
wrap-up
전체 흐름 재정리
강재욱
About
거꾸로 읽는 SSL 시즌3 - VLM
Resources
Stars
Watchers
Forks
You can’t perform that action at this time.