Awesome Visual Dialog

A curated publication list on visual dialog.

This repository was built to facilitate navigating the mainstream on visual dialog.
Please note that only accepted papers (for reliability) by conferences (for brevity) are contained here.

Last updated: 2023/5/7 (not finished yet)

Performance Tables

The visual dialog models for both generative and discriminative tasks have been evaluated by the retrieval-based evaluation metrics: mean reciprocal rank (MRR), recall@k (R@k), mean rank (Mean), and normalized discounted cumulative gain (NDCG). Specifically, all dialogs in VisDial contain a list of 100 answer candidates for each visual question, and there is one ground-truth answer in the answer candidates. The model sorts the answer candidates by the log-likelihood scores and then is evaluated by the four different metrics. MRR, R@k, and Mean consider the rank of the single ground-truth answer, while NDCG considers all relevant answers from the 100-answers list by using the densely annotated relevance scores for all answer candidates. The community regards NDCG as the primary evaluation metric.

In addition, links to the implementations are attached with their framework specification if available. 'o-' and 'u-' indicate the official and the unofficial implementations, respectively.

[Note]
*: re-implemented results
†: use of dense labels
‡: use of additional knowledge

Discriminative Methods

VisDial v0.9 val

ID	Year	Venue	Model (or Authors)	MRR	R@1	R@5	R@10	MEAN↓	code
1	2017	CVPR	LF	58.07	43.82	74.68	84.07	5.78	`[o-torch]`
2	2017	CVPR	HRE	58.46	44.67	74.50	84.22	5.72	`[o-torch]`
3	2017	CVPR	HREA	58.68	44.82	74.81	84.36	5.66	`[o-torch]`
4	2017	CVPR	MN	59.65	45.55	76.22	85.37	5.46	`[o-torch]`
5	2017	NeurIPS	HCIAE	62.22	48.48	78.75	87.59	4.81	`[o-pytorch]`
6	2017	NeurIPS	AMEM	62.27	48.53	78.66	87.43	4.86
7	2018	CVPR	CoAtt	63.98	50.29	80.71	88.81	4.47
8	2018	CVPR	SF	62.42	48.55	78.96	87.75	4.70
9	2018	ECCV	CorefNMN	64.10	50.92	80.18	88.81	4.45
10	2019	CVPR	VGNN	62.85	48.95	79.65	88.36	4.57	`[o-pytorch]`
11	2019	CVPR	RvA	66.34	52.71	82.97	90.73	3.93	`[o-pytorch]`
12	2019	CVPR	FGA	65.25	51.43	82.08	89.56	4.35
13	2019	IJCAI	DVAN	66.67	53.62	82.85	90.72	3.93
14	2019	EMNLP	DAN	66.38	53.33	82.42	90.38	4.04	`[o-pytorch]`
15	2019	ICCV	HACAN	67.92	54.76	83.03	90.68	3.97
16	2020	AAAI	DualVD	62.94	48.64	80.89	89.94	4.17	`[o-pytorch]`
17	2020	CVPR	CAG	67.56	54.64	83.72	91.48	3.75
18	2020	ACL	MVAN	67.65	54.65	83.85	91.47	3.73	`[o-pytorch]`
19	2020	EMNLP	VD-BERT	70.04	57.79	85.34	92.68	4.04	`[o-pytorch]`
20	2022	ICASSP	VU-BERT	63.33	48.71	81.03	89.10	4.19
21	2022	MM	AlignVD	71.65	59.64	88.30	94.72	2.96

VisDial v1.0 val

ID	Year	Venue	Model (or Authors)	NDCG	MRR	R@1	R@5	R@10	MEAN↓	code
1	2017	CVPR	MN*	55.13	60.42	46.09	78.14	88.05	4.63
2	2017	NeurIPS	HCIAE*	57.65	62.96	48.94	80.50	89.66	4.24
3	2018	CVPR	CoAtt*	57.72	62.91	48.86	80.41	89.83	4.21
4	2019	ACL	ReDan	59.32	64.21	50.60	81.39	90.26	4.05
6	2020	ECCV	VisDial-BERT	64.94	69.10	55.88	85.50	93.29	3.25	`[o-pytorch]`
8	2020	ECCV	LTMI	62.72	62.32	48.94	78.65	87.88	4.86
9	2020	ACL	MVAN	60.17	65.33	51.86	82.40	90.90	3.88	`[o-pytorch]`
10	2020	ACL	MCA	60.27	64.33	51.12	80.91	89.65	4.24	`[o-pytorch]`
12	2020	EMNLP	VD-BERT	63.22	67.44	54.02	83.96	92.33	3.53
13	2021	ICASSP	SGLKT	63.41	63.34	-	-	-	-
14	2021	ICASSP	SGLKT†	74.54	59.10	-	-	-	-
15	2022	ICASSP	ICMU	64.30	69.14	56.80	85.09	93.42	3.37
16	2022	CVPR	UTC	63.22	68.58	55.48	85.38	93.20	3.28
17	2022	MM	AlignVD	67.22	70.45	57.64	87.06	94.20	3.05

VisDial v1.0 val (with dense labels)

ID	Year	Venue	Model (or Authors)	NDCG	MRR	R@1	R@5	R@10	MEAN↓	code
1	2020	CVPR	P1+P2†	73.63	50.56	37.99	63.98	77.95	7.26	`[o-pytorch]`
2	2020	ECCV	VisDial-BERT†	75.24	52.22	39.92	65.05	80.63	6.17	`[o-pytorch]`
3	2020	ACL	MCA†	72.18	46.92	32.09	63.85	78.06	7.37	`[o-pytorch]`

VisDial v1.0 test-std

ID	Year	Venue	Model (or Authors)	NDCG	MRR	R@1	R@5	R@10	MEAN↓	code
1	2017	CVPR	LF	45.21	55.42	40.95	72.45	82.83	5.95	`[o-torch]`
2	2017	CVPR	HRE	45.46	54.16	39.93	70.45	81.50	6.41	`[o-torch]`
3	2017	CVPR	MN	47.50	55.49	40.98	72.30	83.30	5.92	`[o-torch]`
4	2018	ECCV	CorefNMN	54.70	61.50	47.55	78.10	88.80	4.40
5	2019	CVPR	VGNN	52.82	61.37	47.33	77.98	87.83	4.57	`[o-pytorch]`
6	2019	CVPR	Sync	57.32	62.20	47.90	80.43	89.95	4.17
7	2019	CVPR	RvA	55.59	63.03	49.03	80.40	89.83	4.18	`[o-pytorch]`
8	2019	CVPR	FGA	52.10	63.70	49.58	80.97	88.55	4.51
9	2019	IJCAI	DVAN	54.70	62.58	48.90	79.35	89.03	4.36
10	2019	EMNLP	DAN	57.59	63.20	49.63	79.75	89.35	4.30	`[o-pytorch]`
11	2019	ICCV	HACAN	57.17	64.22	50.88	80.63	89.45	4.20
12	2019	ACL	ReDan	61.86	53.13	41.38	66.07	74.50	8.91
13	2020	AAAI	CDF	59.49	64.40	50.90	81.18	90.40	3.99
14	2020	AAAI	DualVD	56.32	63.23	49.25	80.23	89.70	4.11	`[o-pytorch]`
15	2020	CVPR	CAG	56.64	63.49	49.85	80.63	90.15	4.11
16	2020	MM	KBGN	57.60	64.13	50.47	80.70	90.16	4.08
17	2020	ECCV	VisDial-BERT	63.87	67.50	53.85	84.68	93.25	3.32	`[o-pytorch]`
18	2020	ECCV	LTMI	60.92	60.65	47.00	77.03	87.75	4.90
19	2020	ACL	MVAN	59.37	64.84	51.45	81.12	90.65	3.97	`[o-pytorch]`
20	2020	EMNLP	VD-BERT	59.96	65.44	51.63	82.23	90.68	3.90	`[o-pytorch]`
21	2021	ICASSP	SGLKT	61.97	62.28	48.15	79.65	89.10	4.34
22	2022	ICASSP	ICMU	61.30	66.82	53.50	83.05	92.05	3.59
23	2022	CVPR	UTC	64.60	68.70	55.73	84.93	93.08	3.32
24	2022	MM	AlignVD	67.23	68.17	54.57	85.65	93.38	3.23
25	2023	CVPR	GST‡	64.91	68.44	55.05	85.18	93.35	3.23	`[o-pytorch]`

VisDial v1.0 test-std (with dense labels)

ID	Year	Venue	Model (or Authors)	NDCG	MRR	R@1	R@5	R@10	MEAN↓	code
1	2020	CVPR	P1+P2†	71.60	48.58	35.98	62.08	77.23	7.48	`[o-pytorch]`
2	2020	ECCV	VisDial-BERT†	74.47	50.74	37.95	64.13	80.00	6.28	`[o-pytorch]`
3	2020	ACL	MCA†	72.47	37.68	20.67	56.67	72.12	8.89	`[o-pytorch]`
4	2020	EMNLP	VD-BERT†	74.54	46.72	33.15	61.58	77.15	7.18	`[o-pytorch]`
5	2021	ICASSP	SGLKT†	72.60	58.01	46.20	71.01	83.20	5.85
6	2022	ICASSP	VU-BERT†	72.87	49.09	33.60	67.20	81.60	6.12
7	2022	MM	AlignVD†	78.70	45.75	29.50	65.70	82.45	6.64
8	2023	CVPR	GST†‡	71.76	68.09	55.18	83.68	91.93	3.57	`[o-pytorch]`

MNIST Dialog

ID	Year	Venue	Model (or Authors)	Accuracy	code
1	2017	NeurIPS	AMEM	96.39
2	2018	ECCV	CorefNMN	99.30

GuessWhat?!

ID	Year	Venue	Model (or Authors)	Train err	Val err	Test err	code
1	2017	CVPR	LSTM+VGG	26.1	38.5	39.2	`[o-tensorflow]`
2	2017	CVPR	HRED+VGG	27.4	38.4	39.6	`[o-tensorflow]`
3	2017	CVPR	A-ATT	26.7	33.7	34.2
4	2019	ICCV	HACAN	26.1	32.3	33.2

Generative Methods

VisDial v0.9 val

ID	Year	Venue	Model (or Authors)	MRR	R@1	R@5	R@10	MEAN↓	code
1	2017	CVPR	LF	51.99	41.83	61.78	67.59	17.07	`[o-torch]`
2	2017	CVPR	HRE	52.37	42.29	62.18	67.92	17.07	`[o-torch]`
3	2017	CVPR	HREA	52.42	42.28	62.33	68.17	16.79	`[o-torch]`
4	2017	CVPR	MN	52.59	42.29	62.85	68.88	17.06	`[o-torch]`
5	2017	NeurIPS	HCIAE	53.86	44.06	63.55	69.24	16.01	`[o-pytorch]`
6	2018	CVPR	CoAtt	55.78	46.10	65.69	71.74	14.43
7	2018	ECCV	CorefNMN	53.50	43.66	63.54	69.93	15.69
8	2019	CVPR	RvA	55.43	45.37	65.27	72.97	10.71	`[o-pytorch]`
9	2019	IJCAI	DVAN	55.94	46.58	65.50	71.25	14.79
10	2020	AAAI	DMRM	55.96	46.20	66.02	72.43	13.15	`[o-pytorch]`
11	2020	ECCV	LTMI*	55.85	46.07	65.97	72.44	14.17
12	2020	EMNLP	VD-BERT	55.95	46.83	65.43	72.05	13.18	`[o-pytorch]`
13	2021	ACL	MITVG	56.83	47.14	67.19	73.72	11.95
14	2022	ICASSP	VU-BERT	54.04	44.50	62.60	71.70	12.49
15	2023	CVPR	GST‡	60.03	50.40	70.74	77.15	12.13	`[o-pytorch]`

VisDial v1.0 val

ID	Year	Venue	Model (or Authors)	NDCG	MRR	R@1	R@5	R@10	MEAN↓	code
1	2017	CVPR	MN*	56.99	47.83	38.01	57.49	64.08	18.76
2	2017	NeurIPS	HCIAE*	59.70	49.07	39.72	58.23	64.73	18.43
3	2018	CVPR	CoAtt*	59.24	49.64	40.09	59.37	65.92	17.86
4	2019	ACL	ReDan	60.47	50.02	40.27	59.93	66.78	17.40
5	2020	AAAI	DMRM	-	50.16	40.15	60.02	67.21	15.19	`[o-pytorch]`
6	2020	IJCAI	DAM	60.93	50.51	40.53	60.84	67.94	16.65	`[o-pytorch]`
7	2020	MM	KBGN	60.42	50.05	40.40	60.11	66.82	17.54
8	2020	ECCV	LTMI	63.58	50.74	40.44	61.61	69.71	14.93
9	2021	ACL	MITVG	61.47	51.14	41.03	61.25	68.49	14.37
10	2022	CVPR	UTC	63.86	52.22	42.56	62.40	69.51	15.67
11	2023	CVPR	GST‡	65.47	53.19	43.08	64.09	71.51	14.34	`[o-pytorch]`

Paper List

[VisDial] | CVPR'17 | Visual Dialog | [pdf] | [o-torch]
[GuessWhat] | CVPR'17 | GuessWhat?! Visual object discovery through multi-modal dialogue | [pdf] | [o-tensorflow]
[HCIAE] | NIPS'17 | Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model | [pdf] | [o-pytorch]
[AMEM] | NIPS'17 | Visual Reference Resolution using Attention Memory for Visual Dialog | [pdf]
[CoAtt] | CVPR'18 | Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning | [pdf]
[SF] | CVPR'18 | Two Can Play This Game: Visual Dialog With Discriminative Question Generation and Answering | [pdf]
[A-ATT] | CVPR'18 | Visual grounding via accumulated attention | [pdf]
[CorefNMN] | ECCV'18 | Visual Coreference Resolution in Visual Dialog using Neural Module Networks | [pdf]
[VGNN] | CVPR'19 | Reasoning Visual Dialogs with Structural and Partial Observations | [pdf] | [o-pytorch]
[Sync] | CVPR'19 | Image-Question-Answer Synergistic Network for Visual Dialog | [pdf]
[RvA] | CVPR'19 | Recursive Visual Attention in Visual Dialog | [pdf] | [o-pytorch]
[FGA] | CVPR'19 | Factor graph attention | [pdf]
[DVAN] | IJCAI'19 | Dual Visual Attention Network for Visual Dialog | [pdf]
[DAN] | EMNLP'19 | Dual Attention Networks for Visual Reference Resolution in Visual Dialog |[pdf] | [o-pytorch]
[HACAN] | ICCV'19 | Making History Matter: History-Advantage Sequence Training for Visual Dialog | [pdf]
[ReDan] | ACL'19 | Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog | [pdf]
[DMRM] | AAAI'20 | DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog | [pdf] | [o-pytorch]
[CDF] | AAAI'20 | Modality-Balanced Models for Visual Dialogue | [pdf]
[DualVD] | AAAI'20 | DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue | [pdf] | [o-pytorch]
[P1+P2] | CVPR'20 | Two Causal Principles for Improving Visual Dialog | [pdf] | [o-pytorch]
[CAG] | CVPR'20 | Iterative Context-Aware Graph Inference for Visual Dialog | [pdf] |
[DAM] | IJCAI'20 | DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue | [pdf] | [o-pytorch]
[KBGN] | MM'20 | KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue] | [pdf]
[Visdial-Bert] | ECCV'20 | Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline] | [pdf] | [o-pytorch]
[LTMI] | ECCV'20 | Efficient attention mechanism for visual dialog that can handle all the interactions between multiple inputs | [pdf] | [o-pytorch]
[MVAN] | ACL'20 | Multi-View Attention Network for Visual Dialog | [pdf] | [o-pytorch]
[MCA] | ACL'20 | History for Visual Dialog: Do we really need it? | [pdf] | [o-pytorch]
[VD-BERT] | EMNLP'20 | VD-BERT: A Unified Vision and Dialog Transformer with BERT | [pdf] | [o-pytorch]
[MITVG] | ACL'21 | Multimodal Incremental Transformer with Visual Grounding for Visual Dialogue Generation | [pdf]
[SGLKT] | EMNLP'21 | Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer | [pdf]
[VU-BERT] | ICASSP'22 | VU-BERT: A Unified framework for Visual Dialog | [pdf]
[ICMU] | ICASSP'22 | Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning | [pdf]
[UTC] | CVPR'22 | UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual Dialog | [pdf]
[AlignVD] | MM'22 | Unsupervised and Pseudo-Supervised Vision-Language Alignment in Visual Dialog | [pdf]
[GST] | CVPR'23 | The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training | [pdf] | [o-pytorch]

Feedback

If you have any suggestions or find missing papers, please feel free to contact me.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Visual Dialog

Table of Contents

Performance Tables

Discriminative Methods

VisDial v0.9 val

VisDial v1.0 val

VisDial v1.0 val (with dense labels)

VisDial v1.0 test-std

VisDial v1.0 test-std (with dense labels)

MNIST Dialog

GuessWhat?!

Generative Methods

VisDial v0.9 val

VisDial v1.0 val

Paper List

Feedback

About

Releases

Packages

MengyuanChen21/Awesome-Visual-Dialog

Folders and files

Latest commit

History

Repository files navigation

Awesome Visual Dialog

Table of Contents

Performance Tables

Discriminative Methods

VisDial v0.9 val

VisDial v1.0 val

VisDial v1.0 val (with dense labels)

VisDial v1.0 test-std

VisDial v1.0 test-std (with dense labels)

MNIST Dialog

GuessWhat?!

Generative Methods

VisDial v0.9 val

VisDial v1.0 val

Paper List

Feedback

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages