Detailed preprocess dataset and format about AVE-Dataset #7

JackHenry1992 · 2020-11-25T09:18:31Z

Thanks for sharing your great job.
Can you provide the detailed process of preprocessing AVE-dataset?

JackHenry1992 · 2020-11-26T03:18:20Z

I have processed the AVE-Dataset using the preprocess.py, and generate trainset. But the loss did not decrease during the training phase.
Loss epoch: 32, step: 79, train_loss: 0.8976, train_acc: 0.4969, lr:0.000010

kyuyeonpooh · 2020-11-26T08:37:49Z

Hi,

Thank you for your interest in my code and project.

Data preprocessing

In my case, I first directly downloaded videos from YouTube using youtube_dl, and saved each video into [YouTube ID of video].mp4

With the above naming convention, when you configure some path settings into config.ini file and then run preprocess.py:

For image, you can get [YouTube ID of video].npz files, each file including 10 frames from a single video (extracted at 1 fps). The frames are resized to 256x256 in default settings.
For audio, you can get [YouTube ID of video].npz files, each file including 10 1-second length of spectrograms from a single video.
The reason why I extracted 10 samples is because video clips in AudioSet are 10-second length.

For more details, you can refer to utils/extractor.py.
You can also change some settings by changing some parameters in methods of Extractor class.

Loss not decreasing

I also faced this issue. This issue seems to occur because the last fully connected layer is so tiny and so vulnerable to noisy data, compared to other layers. Once the last fully connected layer is misguided, it may never be recovered to the expected state.

Here are some several tips that might help you. However, please remind that the network is not always successfully trained even though you apply all the solutions below.

1. Learning Rate
I found using learning rate less or equal than 5e-5 was helpful for successful training.
Using learning rate bigger than 1e-4 highly tends to be failed.

2. Use Larger Batch
Using larger size of batch seems to be usually helpful for training, as data in AudioSet is quite noisy.
In my case, I use 64 as the batch size.

3. In case of training AVE-Net: Tweak the parameter of the last fully connected layer
As you can see in models/avenet.py, there are only 4 parameters in self.fc3 in AVE-Net.
As this tiny network is very vulnerable to the gradient, I initialized this with fixed value to let it be more robust to the noisy data.

Please change this part like given below.

objects-that-sound/model/avenet.py

Lines 24 to 25 in d19f971

 # fusion network 

 self.fc3 = nn.Linear(1, 2)

self.fc3 = nn.Linear(1, 2)
self.fc3.weight.data[0] = -0.7
self.fc3.weight.data[1] = 0.7
self.fc3.bias.data[0] = 1.2
self.fc3.bias.data[1] = -1.2

4. One more tip
In my case, when I saw the loss decreases below to 0.69, the training had gone successfully.

Comment: Pretrained model is available! Please use them if you need.

If you have any questions or have any more issues, feel free to contact me.
You can also leave issues in the repository. I can immediately check.

Sincerely, Kyuyeon.

JackHenry1992 closed this as completed Nov 25, 2020

JackHenry1992 reopened this Nov 25, 2020

JackHenry1992 changed the title ~~Preprocess dataset~~ Detailed preprocess dataset and format about AVE-Dataset Nov 25, 2020

kyuyeonpooh added the good first issue Good for newcomers label Nov 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detailed preprocess dataset and format about AVE-Dataset #7

Detailed preprocess dataset and format about AVE-Dataset #7

JackHenry1992 commented Nov 25, 2020 •

edited

JackHenry1992 commented Nov 26, 2020

kyuyeonpooh commented Nov 26, 2020 •

edited

Detailed preprocess dataset and format about AVE-Dataset #7

Detailed preprocess dataset and format about AVE-Dataset #7

Comments

JackHenry1992 commented Nov 25, 2020 • edited

JackHenry1992 commented Nov 26, 2020

kyuyeonpooh commented Nov 26, 2020 • edited

Data preprocessing

Loss not decreasing

JackHenry1992 commented Nov 25, 2020 •

edited

kyuyeonpooh commented Nov 26, 2020 •

edited