GitHub - kyuyeonpooh/objects-that-sound: The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.

Objects that Sound

The repository includes unofficial but full reproduction of the paper "Objects that Sound" from ECCV 2018. 😊

Environment

We implement this work in PyTorch with Python 3.6, and we strongly recommend to use Ubuntu 16.04.

It took about less than 20 hours to train the models on 48,000 videos, with Intel i7-9700k and RTX 2080 Ti.

For detailed package requirements, please refer to requirements.txt for more information.

Model

We implement AVE-Net, AVOL-Net, and also L³-Net which is one of the baseline models.

Dataset

We use or construct 3 different dataset to train and evaluate the models.

AudioSet-Instruments
AudioSet-Animal
AVE-Dataset

Due to the resource limitation, we use 20% subset of AudioSet-Instruments suggested in the paper.

Please refer to Final Report for more detailed explanations about each dataset.

Results

We note that our results are quite different from the paper.

We expect this difference may come from difference in size of the dataset, batch size, learning rate, or any other subtle difference in training configurations.

1. Accuracy on AVC Task

2. Cross-modal Retrieval (Qualitative)

More qualitative results are available in our slides provided below.

3. Cross-modal Retrieval (Quantitative)

4. Sound Localization (Qualitative)

5. Embedding Visualization with t-SNE

Embeddings with ● are from images, and embeddings with × are from audios.

Acknowledgement

We have gotten many insights of implementation from this repository, thanks to @rohitrango.

Supplementary Material

As this is made for our course project, we ready for PPT slides with corresponding presentation.

Name	Slide	Video
Project Proposal	PPTX	YouTube
Progress Update	PPTX	YouTube
Final Presentation	PPTX	YouTube

Also, please refer to our Final Report for detailed explanation of implementation and training configurations.

Contact

We are welcoming any questions and issues of implementation. If you have any, please contact to e-mail below or leave a issue.

Contributor	E-mail
Kyuyeon Kim	[email protected]
Hyeongryeol Ryu	[email protected]
Yeonjae Kim	[email protected]

Comment

If you find this repository to be useful, please Star ⭐ or Fork 🍴 this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
csv		csv
json		json
material		material
metadata		metadata
model		model
save		save
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
audio_2_audio_queries.py		audio_2_audio_queries.py
config.ini		config.ini
cross_modal_queries.py		cross_modal_queries.py
generate_embeddings_video_audio.py		generate_embeddings_video_audio.py
image_2_image_queries.py		image_2_image_queries.py
localize_sound.py		localize_sound.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
tsne.py		tsne.py

License

kyuyeonpooh/objects-that-sound

Folders and files

Latest commit

History

Repository files navigation

Objects that Sound

Environment

Model

Dataset

Results

1. Accuracy on AVC Task

2. Cross-modal Retrieval (Qualitative)

3. Cross-modal Retrieval (Quantitative)

4. Sound Localization (Qualitative)

5. Embedding Visualization with t-SNE

Acknowledgement

Supplementary Material

Contact

Comment

About

Topics

Resources

License

Stars

Watchers

Forks

Languages