Skip to content

MengyuanChen21/Awesome-Visual-Dialog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 

Repository files navigation

Awesome Visual Dialog

A curated publication list on visual dialog.

This repository was built to facilitate navigating the mainstream on visual dialog.
Please note that only accepted papers (for reliability) by conferences (for brevity) are contained here.

Last updated: 2023/5/7 (not finished yet)

Table of Contents

Performance Tables

The visual dialog models for both generative and discriminative tasks have been evaluated by the retrieval-based evaluation metrics: mean reciprocal rank (MRR), recall@k (R@k), mean rank (Mean), and normalized discounted cumulative gain (NDCG). Specifically, all dialogs in VisDial contain a list of 100 answer candidates for each visual question, and there is one ground-truth answer in the answer candidates. The model sorts the answer candidates by the log-likelihood scores and then is evaluated by the four different metrics. MRR, R@k, and Mean consider the rank of the single ground-truth answer, while NDCG considers all relevant answers from the 100-answers list by using the densely annotated relevance scores for all answer candidates. The community regards NDCG as the primary evaluation metric.

In addition, links to the implementations are attached with their framework specification if available. 'o-' and 'u-' indicate the official and the unofficial implementations, respectively.

[Note]
*: re-implemented results
†: use of dense labels
‡: use of additional knowledge

Discriminative Methods

VisDial v0.9 val

ID Year Venue Model
(or Authors)
MRR R@1 R@5 R@10 MEAN↓ code
1 2017 CVPR LF 58.07 43.82 74.68 84.07 5.78 [o-torch]
2 2017 CVPR HRE 58.46 44.67 74.50 84.22 5.72 [o-torch]
3 2017 CVPR HREA 58.68 44.82 74.81 84.36 5.66 [o-torch]
4 2017 CVPR MN 59.65 45.55 76.22 85.37 5.46 [o-torch]
5 2017 NeurIPS HCIAE 62.22 48.48 78.75 87.59 4.81 [o-pytorch]
6 2017 NeurIPS AMEM 62.27 48.53 78.66 87.43 4.86
7 2018 CVPR CoAtt 63.98 50.29 80.71 88.81 4.47
8 2018 CVPR SF 62.42 48.55 78.96 87.75 4.70
9 2018 ECCV CorefNMN 64.10 50.92 80.18 88.81 4.45
10 2019 CVPR VGNN 62.85 48.95 79.65 88.36 4.57 [o-pytorch]
11 2019 CVPR RvA 66.34 52.71 82.97 90.73 3.93 [o-pytorch]
12 2019 CVPR FGA 65.25 51.43 82.08 89.56 4.35
13 2019 IJCAI DVAN 66.67 53.62 82.85 90.72 3.93
14 2019 EMNLP DAN 66.38 53.33 82.42 90.38 4.04 [o-pytorch]
15 2019 ICCV HACAN 67.92 54.76 83.03 90.68 3.97
16 2020 AAAI DualVD 62.94 48.64 80.89 89.94 4.17 [o-pytorch]
17 2020 CVPR CAG 67.56 54.64 83.72 91.48 3.75
18 2020 ACL MVAN 67.65 54.65 83.85 91.47 3.73 [o-pytorch]
19 2020 EMNLP VD-BERT 70.04 57.79 85.34 92.68 4.04 [o-pytorch]
20 2022 ICASSP VU-BERT 63.33 48.71 81.03 89.10 4.19
21 2022 MM AlignVD 71.65 59.64 88.30 94.72 2.96

VisDial v1.0 val

ID Year Venue Model
(or Authors)
NDCG MRR R@1 R@5 R@10 MEAN↓ code
1 2017 CVPR MN* 55.13 60.42 46.09 78.14 88.05 4.63
2 2017 NeurIPS HCIAE* 57.65 62.96 48.94 80.50 89.66 4.24
3 2018 CVPR CoAtt* 57.72 62.91 48.86 80.41 89.83 4.21
4 2019 ACL ReDan 59.32 64.21 50.60 81.39 90.26 4.05
6 2020 ECCV VisDial-BERT 64.94 69.10 55.88 85.50 93.29 3.25 [o-pytorch]
8 2020 ECCV LTMI 62.72 62.32 48.94 78.65 87.88 4.86
9 2020 ACL MVAN 60.17 65.33 51.86 82.40 90.90 3.88 [o-pytorch]
10 2020 ACL MCA 60.27 64.33 51.12 80.91 89.65 4.24 [o-pytorch]
12 2020 EMNLP VD-BERT 63.22 67.44 54.02 83.96 92.33 3.53
13 2021 ICASSP SGLKT 63.41 63.34 - - - -
14 2021 ICASSP SGLKT 74.54 59.10 - - - -
15 2022 ICASSP ICMU 64.30 69.14 56.80 85.09 93.42 3.37
16 2022 CVPR UTC 63.22 68.58 55.48 85.38 93.20 3.28
17 2022 MM AlignVD 67.22 70.45 57.64 87.06 94.20 3.05

VisDial v1.0 val (with dense labels)

ID Year Venue Model
(or Authors)
NDCG MRR R@1 R@5 R@10 MEAN↓ code
1 2020 CVPR P1+P2 73.63 50.56 37.99 63.98 77.95 7.26 [o-pytorch]
2 2020 ECCV VisDial-BERT 75.24 52.22 39.92 65.05 80.63 6.17 [o-pytorch]
3 2020 ACL MCA 72.18 46.92 32.09 63.85 78.06 7.37 [o-pytorch]

VisDial v1.0 test-std

ID Year Venue Model
(or Authors)
NDCG MRR R@1 R@5 R@10 MEAN↓ code
1 2017 CVPR LF 45.21 55.42 40.95 72.45 82.83 5.95 [o-torch]
2 2017 CVPR HRE 45.46 54.16 39.93 70.45 81.50 6.41 [o-torch]
3 2017 CVPR MN 47.50 55.49 40.98 72.30 83.30 5.92 [o-torch]
4 2018 ECCV CorefNMN 54.70 61.50 47.55 78.10 88.80 4.40
5 2019 CVPR VGNN 52.82 61.37 47.33 77.98 87.83 4.57 [o-pytorch]
6 2019 CVPR Sync 57.32 62.20 47.90 80.43 89.95 4.17
7 2019 CVPR RvA 55.59 63.03 49.03 80.40 89.83 4.18 [o-pytorch]
8 2019 CVPR FGA 52.10 63.70 49.58 80.97 88.55 4.51
9 2019 IJCAI DVAN 54.70 62.58 48.90 79.35 89.03 4.36
10 2019 EMNLP DAN 57.59 63.20 49.63 79.75 89.35 4.30 [o-pytorch]
11 2019 ICCV HACAN 57.17 64.22 50.88 80.63 89.45 4.20
12 2019 ACL ReDan 61.86 53.13 41.38 66.07 74.50 8.91
13 2020 AAAI CDF 59.49 64.40 50.90 81.18 90.40 3.99
14 2020 AAAI DualVD 56.32 63.23 49.25 80.23 89.70 4.11 [o-pytorch]
15 2020 CVPR CAG 56.64 63.49 49.85 80.63 90.15 4.11
16 2020 MM KBGN 57.60 64.13 50.47 80.70 90.16 4.08
17 2020 ECCV VisDial-BERT 63.87 67.50 53.85 84.68 93.25 3.32 [o-pytorch]
18 2020 ECCV LTMI 60.92 60.65 47.00 77.03 87.75 4.90
19 2020 ACL MVAN 59.37 64.84 51.45 81.12 90.65 3.97 [o-pytorch]
20 2020 EMNLP VD-BERT 59.96 65.44 51.63 82.23 90.68 3.90 [o-pytorch]
21 2021 ICASSP SGLKT 61.97 62.28 48.15 79.65 89.10 4.34
22 2022 ICASSP ICMU 61.30 66.82 53.50 83.05 92.05 3.59
23 2022 CVPR UTC 64.60 68.70 55.73 84.93 93.08 3.32
24 2022 MM AlignVD 67.23 68.17 54.57 85.65 93.38 3.23
25 2023 CVPR GST 64.91 68.44 55.05 85.18 93.35 3.23 [o-pytorch]

VisDial v1.0 test-std (with dense labels)

ID Year Venue Model
(or Authors)
NDCG MRR R@1 R@5 R@10 MEAN↓ code
1 2020 CVPR P1+P2 71.60 48.58 35.98 62.08 77.23 7.48 [o-pytorch]
2 2020 ECCV VisDial-BERT 74.47 50.74 37.95 64.13 80.00 6.28 [o-pytorch]
3 2020 ACL MCA 72.47 37.68 20.67 56.67 72.12 8.89 [o-pytorch]
4 2020 EMNLP VD-BERT 74.54 46.72 33.15 61.58 77.15 7.18 [o-pytorch]
5 2021 ICASSP SGLKT 72.60 58.01 46.20 71.01 83.20 5.85
6 2022 ICASSP VU-BERT 72.87 49.09 33.60 67.20 81.60 6.12
7 2022 MM AlignVD 78.70 45.75 29.50 65.70 82.45 6.64
8 2023 CVPR GST†‡ 71.76 68.09 55.18 83.68 91.93 3.57 [o-pytorch]

MNIST Dialog

ID Year Venue Model
(or Authors)
Accuracy code
1 2017 NeurIPS AMEM 96.39
2 2018 ECCV CorefNMN 99.30

GuessWhat?!

ID Year Venue Model
(or Authors)
Train err Val err Test err code
1 2017 CVPR LSTM+VGG 26.1 38.5 39.2 [o-tensorflow]
2 2017 CVPR HRED+VGG 27.4 38.4 39.6 [o-tensorflow]
3 2017 CVPR A-ATT 26.7 33.7 34.2
4 2019 ICCV HACAN 26.1 32.3 33.2

Generative Methods

VisDial v0.9 val

ID Year Venue Model
(or Authors)
MRR R@1 R@5 R@10 MEAN↓ code
1 2017 CVPR LF 51.99 41.83 61.78 67.59 17.07 [o-torch]
2 2017 CVPR HRE 52.37 42.29 62.18 67.92 17.07 [o-torch]
3 2017 CVPR HREA 52.42 42.28 62.33 68.17 16.79 [o-torch]
4 2017 CVPR MN 52.59 42.29 62.85 68.88 17.06 [o-torch]
5 2017 NeurIPS HCIAE 53.86 44.06 63.55 69.24 16.01 [o-pytorch]
6 2018 CVPR CoAtt 55.78 46.10 65.69 71.74 14.43
7 2018 ECCV CorefNMN 53.50 43.66 63.54 69.93 15.69
8 2019 CVPR RvA 55.43 45.37 65.27 72.97 10.71 [o-pytorch]
9 2019 IJCAI DVAN 55.94 46.58 65.50 71.25 14.79
10 2020 AAAI DMRM 55.96 46.20 66.02 72.43 13.15 [o-pytorch]
11 2020 ECCV LTMI* 55.85 46.07 65.97 72.44 14.17
12 2020 EMNLP VD-BERT 55.95 46.83 65.43 72.05 13.18 [o-pytorch]
13 2021 ACL MITVG 56.83 47.14 67.19 73.72 11.95
14 2022 ICASSP VU-BERT 54.04 44.50 62.60 71.70 12.49
15 2023 CVPR GST 60.03 50.40 70.74 77.15 12.13 [o-pytorch]

VisDial v1.0 val

ID Year Venue Model
(or Authors)
NDCG MRR R@1 R@5 R@10 MEAN↓ code
1 2017 CVPR MN* 56.99 47.83 38.01 57.49 64.08 18.76
2 2017 NeurIPS HCIAE* 59.70 49.07 39.72 58.23 64.73 18.43
3 2018 CVPR CoAtt* 59.24 49.64 40.09 59.37 65.92 17.86
4 2019 ACL ReDan 60.47 50.02 40.27 59.93 66.78 17.40
5 2020 AAAI DMRM - 50.16 40.15 60.02 67.21 15.19 [o-pytorch]
6 2020 IJCAI DAM 60.93 50.51 40.53 60.84 67.94 16.65 [o-pytorch]
7 2020 MM KBGN 60.42 50.05 40.40 60.11 66.82 17.54
8 2020 ECCV LTMI 63.58 50.74 40.44 61.61 69.71 14.93
9 2021 ACL MITVG 61.47 51.14 41.03 61.25 68.49 14.37
10 2022 CVPR UTC 63.86 52.22 42.56 62.40 69.51 15.67
11 2023 CVPR GST 65.47 53.19 43.08 64.09 71.51 14.34 [o-pytorch]

Paper List

  1. [VisDial] | CVPR'17 | Visual Dialog | [pdf] | [o-torch]
  2. [GuessWhat] | CVPR'17 | GuessWhat?! Visual object discovery through multi-modal dialogue | [pdf] | [o-tensorflow]
  3. [HCIAE] | NIPS'17 | Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model | [pdf] | [o-pytorch]
  4. [AMEM] | NIPS'17 | Visual Reference Resolution using Attention Memory for Visual Dialog | [pdf]
  5. [CoAtt] | CVPR'18 | Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning | [pdf]
  6. [SF] | CVPR'18 | Two Can Play This Game: Visual Dialog With Discriminative Question Generation and Answering | [pdf]
  7. [A-ATT] | CVPR'18 | Visual grounding via accumulated attention | [pdf]
  8. [CorefNMN] | ECCV'18 | Visual Coreference Resolution in Visual Dialog using Neural Module Networks | [pdf]
  9. [VGNN] | CVPR'19 | Reasoning Visual Dialogs with Structural and Partial Observations | [pdf] | [o-pytorch]
  10. [Sync] | CVPR'19 | Image-Question-Answer Synergistic Network for Visual Dialog | [pdf]
  11. [RvA] | CVPR'19 | Recursive Visual Attention in Visual Dialog | [pdf] | [o-pytorch]
  12. [FGA] | CVPR'19 | Factor graph attention | [pdf]
  13. [DVAN] | IJCAI'19 | Dual Visual Attention Network for Visual Dialog | [pdf]
  14. [DAN] | EMNLP'19 | Dual Attention Networks for Visual Reference Resolution in Visual Dialog |[pdf] | [o-pytorch]
  15. [HACAN] | ICCV'19 | Making History Matter: History-Advantage Sequence Training for Visual Dialog | [pdf]
  16. [ReDan] | ACL'19 | Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog | [pdf]
  17. [DMRM] | AAAI'20 | DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog | [pdf] | [o-pytorch]
  18. [CDF] | AAAI'20 | Modality-Balanced Models for Visual Dialogue | [pdf]
  19. [DualVD] | AAAI'20 | DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue | [pdf] | [o-pytorch]
  20. [P1+P2] | CVPR'20 | Two Causal Principles for Improving Visual Dialog | [pdf] | [o-pytorch]
  21. [CAG] | CVPR'20 | Iterative Context-Aware Graph Inference for Visual Dialog | [pdf] |
  22. [DAM] | IJCAI'20 | DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue | [pdf] | [o-pytorch]
  23. [KBGN] | MM'20 | KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue] | [pdf]
  24. [Visdial-Bert] | ECCV'20 | Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline] | [pdf] | [o-pytorch]
  25. [LTMI] | ECCV'20 | Efficient attention mechanism for visual dialog that can handle all the interactions between multiple inputs | [pdf] | [o-pytorch]
  26. [MVAN] | ACL'20 | Multi-View Attention Network for Visual Dialog | [pdf] | [o-pytorch]
  27. [MCA] | ACL'20 | History for Visual Dialog: Do we really need it? | [pdf] | [o-pytorch]
  28. [VD-BERT] | EMNLP'20 | VD-BERT: A Unified Vision and Dialog Transformer with BERT | [pdf] | [o-pytorch]
  29. [MITVG] | ACL'21 | Multimodal Incremental Transformer with Visual Grounding for Visual Dialogue Generation | [pdf]
  30. [SGLKT] | EMNLP'21 | Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer | [pdf]
  31. [VU-BERT] | ICASSP'22 | VU-BERT: A Unified framework for Visual Dialog | [pdf]
  32. [ICMU] | ICASSP'22 | Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning | [pdf]
  33. [UTC] | CVPR'22 | UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual Dialog | [pdf]
  34. [AlignVD] | MM'22 | Unsupervised and Pseudo-Supervised Vision-Language Alignment in Visual Dialog | [pdf]
  35. [GST] | CVPR'23 | The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training | [pdf] | [o-pytorch]

Feedback

If you have any suggestions or find missing papers, please feel free to contact me.

About

A curated publication list on visual dialog

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published