Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detailed preprocess dataset and format about AVE-Dataset #7

Open
JackHenry1992 opened this issue Nov 25, 2020 · 2 comments
Open

Detailed preprocess dataset and format about AVE-Dataset #7

JackHenry1992 opened this issue Nov 25, 2020 · 2 comments
Labels
good first issue Good for newcomers

Comments

@JackHenry1992
Copy link

JackHenry1992 commented Nov 25, 2020

Thanks for sharing your great job.
Can you provide the detailed process of preprocessing AVE-dataset?

@JackHenry1992 JackHenry1992 reopened this Nov 25, 2020
@JackHenry1992 JackHenry1992 changed the title Preprocess dataset Detailed preprocess dataset and format about AVE-Dataset Nov 25, 2020
@JackHenry1992
Copy link
Author

I have processed the AVE-Dataset using the preprocess.py, and generate trainset. But the loss did not decrease during the training phase.
Loss epoch: 32, step: 79, train_loss: 0.8976, train_acc: 0.4969, lr:0.000010

@kyuyeonpooh
Copy link
Owner

kyuyeonpooh commented Nov 26, 2020

Hi,

Thank you for your interest in my code and project.

Data preprocessing

In my case, I first directly downloaded videos from YouTube using youtube_dl, and saved each video into [YouTube ID of video].mp4

With the above naming convention, when you configure some path settings into config.ini file and then run preprocess.py:

  • For image, you can get [YouTube ID of video].npz files, each file including 10 frames from a single video (extracted at 1 fps). The frames are resized to 256x256 in default settings.
  • For audio, you can get [YouTube ID of video].npz files, each file including 10 1-second length of spectrograms from a single video.
    The reason why I extracted 10 samples is because video clips in AudioSet are 10-second length.

For more details, you can refer to utils/extractor.py.
You can also change some settings by changing some parameters in methods of Extractor class.


Loss not decreasing

I also faced this issue. This issue seems to occur because the last fully connected layer is so tiny and so vulnerable to noisy data, compared to other layers. Once the last fully connected layer is misguided, it may never be recovered to the expected state.

Here are some several tips that might help you. However, please remind that the network is not always successfully trained even though you apply all the solutions below.

1. Learning Rate
I found using learning rate less or equal than 5e-5 was helpful for successful training.
Using learning rate bigger than 1e-4 highly tends to be failed.

2. Use Larger Batch
Using larger size of batch seems to be usually helpful for training, as data in AudioSet is quite noisy.
In my case, I use 64 as the batch size.

3. In case of training AVE-Net: Tweak the parameter of the last fully connected layer
As you can see in models/avenet.py, there are only 4 parameters in self.fc3 in AVE-Net.
As this tiny network is very vulnerable to the gradient, I initialized this with fixed value to let it be more robust to the noisy data.

Please change this part like given below.

# fusion network
self.fc3 = nn.Linear(1, 2)

self.fc3 = nn.Linear(1, 2)
self.fc3.weight.data[0] = -0.7
self.fc3.weight.data[1] = 0.7
self.fc3.bias.data[0] = 1.2
self.fc3.bias.data[1] = -1.2

4. One more tip
In my case, when I saw the loss decreases below to 0.69, the training had gone successfully.


Comment: Pretrained model is available! Please use them if you need.

If you have any questions or have any more issues, feel free to contact me.
You can also leave issues in the repository. I can immediately check.

Sincerely, Kyuyeon.

@kyuyeonpooh kyuyeonpooh added the good first issue Good for newcomers label Nov 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants