Skip to content

xinke-wang/Multi-Modal-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 

Repository files navigation

Multi-Modal-ML

This repo would be occasionally but continuously updated, which would collect papers that are related to Multi-Modal Machine Learning applications.

Fancy Applications

This part collects fancy applications that rely on Multi-Modal Machine Learning, especially those modal combinations that are NOT well learned at this point, that is to say, common and popular tasks such as Visual Question Answering, Image Caption Generation will not be focused here. If only one modality is involed, it means this paper introduces how to obatin the data under this modal.

Year Venue Paper Modalities Project/Code
2020 ECCV Multiple Sound Sources Localization fromCoarse to Fine Vision+Sound -
2020 CVPR Music Gesture for Visual Sound Separation Vision+Sound Project
2020 ICASSP Sight to Sound: An End-to-end Approach for Visual Piano Transcription Vision+Sound Project
2019 CVPR Connecting Touch and Vision via Cross-Modal Prediction Vision+Touch Project/Code
2019 Nature Learning the signatures of the human grasp using a scalable tactile glove Touch Project/Code
2019 IJCV Learning Sight from Sound:Ambient Sound Provides Supervision for Visual Learning Vision+Sound -
2019 NC Real-time decoding of question-and-answer speech dialogue using human cortical activity Speech+ECoG -
2017 CVPR Lip Reading Sentences in the Wild Vision+Speech -
2016 ECCV Ambient Sound Provides Supervisionfor Visual Learning Vision+Sound -
2016 CVPR Visually Indicated Sounds Vision+Sound Project

Employing Extra Modalities to Boost Traditional Tasks

This part collects the papers that introduce new modalities into traditional Tasks.

Year Venue Paper Task Basic New
2020 ECCV Not only Look, but also Listen: LearningMultimodal Violence Detection under WeakSupervision Detection Vision Sound
2019 ICCV Self-Supervised Moving Vehicle Tracking With Stereo Sound Tracking Vision Sound
2019 ICCVW DECCNet: Depth Enhanced Crowd Counting Counting Vision Depth
2019 CVPRW WiFi and Vision Multimodal Learning for Accurate and Robust Device-Free Human Activity Recognition Recognition Vision WiFi

Datasets

This part lists some huge datasets that include multi-modal annotations

Year Dataset Modalities Project Paper
2020 VGG-Sound Vision+Sound Project ICASSP
2017 Lip Reading in the Wild Vision+Speech Project ACCV
2016 Cross-Modal Places Vision+Language Project CVPR & T-PAMI

Tutorial/Workshop/Survey

This part lists some extra resources about Multi-modal Machine Learning

Survey Papers

Year Venue Title
2018 T-PAMI Multimodal Machine Learning: A Survey and Taxonomy

Workshops

Year Venue Title Proceedings
2020 CVPR Workshop on Multimodal Learning Proceedings
2019 ICCV Cross-Modal Learning in Real World Proceedings
2019 CVPR 2nd Multimodal Learning and Applications Workshop (MULA) Proceedings
2018 ECCV 1st Multimodal Learning and Applications Workshop (MULA) Proceedings

Tutorials

Year Venue Title
2016 CVPR Multimodal Machine Learning tutorial

People

This part lists researchers that are actively working on Multi-Modal Machine Learning.

Name Affiliation Research Interests Google Scholar
Antonio Torralba MIT Vision+Audition+Touch Scholar
Andrew Zisserman Oxford Vision+Audio Scholar
Andrea Vedaldi Oxford Vision+Audio Scholar

Contact

If you are also interested in Multi-Modal Machine Learning, and would like to recommend some papers/projects to this repo, feel free to open an issue or make a pull request.

About

This repo collects Multi-modal Machine Learning papers.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published