How to use trained model in inference? #193

solee0022 · 2023-10-23T11:03:43Z

I trained model with Korean dataset. And I got checkpoints of Discriminator and Generator.
Is it right to use generator checkpoint( 'G_20000.pth') except for Discriminator in inference?
I've made Korean Synthesized audio with only G_20000.pth, but the synthesized audio was terrible.
And below is the code of I changed.

net_g = SynthesizerTrn( len(lang_symbols['en']), hps.data.filter_length // 2 + 1, hps.train.segment_size // hps.data.hop_length, **hps.model).cuda()
_ = net_g.eval()
_ = utils.load_checkpoint("/path/to/G_20000.pth", net_g, None)

The text was updated successfully, but these errors were encountered:

CreativeSelf0 · 2023-11-01T01:40:10Z

@solee0022, I've identified the root of the issue. It's related to my training configuration file. Inside configs/your_config_name.json, there's a flag cleaned_text: True. When set to True, the model assumes that the input data is already preprocessed. Unfortunately, my data wasn't preprocessed.

The consequence? The model skips the cleaner function, which I was using to convert from grapheme to phoneme sequences. As a result, I unintentionally 😅trained the model on a character-based approach rather than phoneme-based.

Moreover, in the inference.ipynb notebook, the get_text function operates under the assumption that the text isn't preprocessed, which makes sense for inference. This inconsistency means that the synthesized audio is based on a phoneme sequence that the model isn't familiar with. In order to verify if this is the issue in your case, inside the notebook inference.ipynb edit the function get_text and instead of using text_to_sequence use instead cleaned_text_to_sequence, and observe your results.

from text import text_to_sequence, cleaned_text_to_sequence

def get_text(text, hps):
    text_norm = cleaned_text_to_sequence(text)
    if hps.data.add_blank:
        text_norm = commons.intersperse(text_norm, 0)
    text_norm = torch.LongTensor(text_norm)
    return text_norm

In addition, it's also a good idea to preprocess your text beforehand as an offline preprocessing step, so you do not take time while training the model to do the cleaning.

ToiYeuTien · 2023-12-27T04:13:54Z

@solee0022, Tôi đã xác định được gốc rễ của vấn đề. Nó liên quan đến tập tin cấu hình đào tạo của tôi. Bên trong configs/your_config_name.jsoncó một lá cờ clean_text: True . Khi được đặt thành True, mô hình giả định rằng dữ liệu đầu vào đã được xử lý trước. Thật không may, dữ liệu của tôi không được xử lý trước.

Hậu quả? Mô hình này bỏ qua chức năng dọn dẹp mà tôi đang sử dụng để chuyển đổi từ chuỗi biểu đồ sang chuỗi âm vị. Kết quả là tôi đã vô tình 😅 đào tạo người mẫu theo cách tiếp cận dựa trên ký tự thay vì dựa trên âm vị.

Hơn nữa, trong inference.ipynbsổ ghi chép, hàm get_text hoạt động với giả định rằng văn bản không được xử lý trước, điều này rất hợp lý cho việc suy luận. Sự không nhất quán này có nghĩa là âm thanh tổng hợp dựa trên chuỗi âm vị mà mô hình không quen thuộc. Để xác minh xem đây có phải là sự cố trong trường hợp của bạn hay không, hãy inference.ipynbchỉnh sửa hàm get_text bên trong sổ ghi chép và thay vì sử dụng text_to_sequence , hãy sử dụng clean_text_to_sequence và quan sát kết quả của bạn.
from text import text_to_sequence, cleaned_text_to_sequence

def get_text(text, hps):
    text_norm = cleaned_text_to_sequence(text)
    if hps.data.add_blank:
        text_norm = commons.intersperse(text_norm, 0)
    text_norm = torch.LongTensor(text_norm)
    return text_norm
Ngoài ra, bạn cũng nên xử lý trước văn bản của mình dưới dạng bước xử lý trước ngoại tuyến để không mất thời gian trong khi huấn luyện mô hình thực hiện việc làm sạch.

Thank you, I changed according to your instructions and it worked.
I have a Vietnamese data set with more than 10 hours of spoken audio. I trained it with step 855000 but it still can't hear clearly. The sound sounds like a child learning to talk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use trained model in inference? #193

How to use trained model in inference? #193

solee0022 commented Oct 23, 2023 •

edited

CreativeSelf0 commented Nov 1, 2023 •

edited

ToiYeuTien commented Dec 27, 2023

How to use trained model in inference? #193

How to use trained model in inference? #193

Comments

solee0022 commented Oct 23, 2023 • edited

CreativeSelf0 commented Nov 1, 2023 • edited

ToiYeuTien commented Dec 27, 2023

solee0022 commented Oct 23, 2023 •

edited

CreativeSelf0 commented Nov 1, 2023 •

edited