Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pretrain loss #56

Open
MarsMeng1994 opened this issue Jul 7, 2023 · 4 comments
Open

pretrain loss #56

MarsMeng1994 opened this issue Jul 7, 2023 · 4 comments

Comments

@MarsMeng1994
Copy link

Excuse me, what value does my pre-training loss reach, can I start fintune tts?
image
i found my finued tts model can generate a mel-spectrom but diffrent to ori mel-spectrom very much。
image
Is this due to the bart loss is too high?

@mechanicalsea
Copy link
Contributor

As mentioned in the SpeechT5 paper: "We pre-train the proposed SpeechT5 model on 32 V100 GPUs with a batch size of around 90s samples per GPU for speech and 12k tokens per GPU for text and set the update frequency to 2 for 500k steps."
Thus, keeping pre-training.
For TTS fine-tuning, the pre-training without $\mathcal{L}{mlm}^s$ is more suitable because as mentioned in the paper "The proposed SpeechT5 trained without $\mathcal{L}{mlm}^s$ is considered because the bidirectional masked pre- diction loss is proposed to help the encoder learn to encode the speech signal, and this variant achieves superior Naturalness, as shown in Table 13 (in Appendix D)."

@MarsMeng1994
Copy link
Author

thanks for reply
does the nums_updates in the log means step? if true, it consume 2 hour for each 100 step in the picture, so it means it will consum10000 hour for pretrain?
can i use a english pretrained model to fintune a other language model? can it work?

@mechanicalsea
Copy link
Contributor

10000 hours seems so long. Actually, pre-training on the 32 V100 GPUs cost around one week. So pre-training using multiple gpu is recommended.
The fine-tuning on the other languages is available by replace the English vocabulary to the fine-tuned vocabulary, but it causes language mismatch between pre-training and fine-tuning, which may influence the performance of the pre-training method.

@MarsMeng1994
Copy link
Author

MarsMeng1994 commented Jul 12, 2023

thanks for reply
i will try to use more GPU. There is an other question, when pretraining, the num_workers is 0, why don't set it to a higher number such as fintune tts
image
can i set it to a higher number to accelerate pretraining?

when i set num_workers=1, there is a error like:
RuntimeError: unable to mmap 408 bytes from file </torch_2632095_3802486040_258611>: Cannot allocate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants