Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling melo CLI for "ZH" long coldstart times, even if cached #130

Open
zihaolam opened this issue May 11, 2024 · 0 comments
Open

Calling melo CLI for "ZH" long coldstart times, even if cached #130

zihaolam opened this issue May 11, 2024 · 0 comments

Comments

@zihaolam
Copy link

zihaolam commented May 11, 2024

running this command, gets me a consistent output in approx 7 seconds:
melo 我的名字叫小杨 dog.wav --language ZH

/Users/zihaolam/Projects/tts-editor/MeloTTS/melo/main.py:71: UserWarning: You specified a speaker but the language is English.
  warnings.warn("You specified a speaker but the language is English.")
loading pickled model from cache
loaded pickled model from cache, took 8.529947996139526
 > Text split to sentences.
我的名字叫小杨
 > ===========================
  0%|                                                                  | 0/1 [00:00<?, ?it/s]Building prefix dict from the default dictionary ...
Loading model from cache /var/folders/j4/zkddp3ms6493qzbf3qf7rfwr0000gn/T/jieba.cache
Loading model cost 0.406 seconds.
Prefix dict has been built successfully.
Some weights of the model checkpoint at bert-base-multilingual-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
/Users/zihaolam/Projects/tts-editor/MeloTTS/.venv/lib/python3.9/site-packages/torch/nn/functional.py:4522: UserWarning: MPS: The constant padding of more than 3 dimensions is not currently supported natively. It uses View Ops default implementation to run. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Pad.mm:472.)
  return torch._C._nn.pad(input, pad, mode, value)
/Users/zihaolam/Projects/tts-editor/MeloTTS/melo/commons.py:123: UserWarning: MPS: no support for int64 for min_max, downcasting to a smaller data type (int32/float32). Native support for int64 has been added in macOS 13.3. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/ReduceOps.mm:612.)
  max_length = length.max()
100%|██████████████████████████████████████████████████████████| 1/1 [00:07<00:00,  7.51s/it]
def get_model_pkl_path(language: str):
    return os.path.join(os.path.dirname(__file__), f"model_{language}.pkl")


def get_model(language: str, device: str):
    model_pkl_path = get_model_pkl_path(language)
    if not os.path.exists(model_pkl_path):
        from melo.api import TTS

        model = TTS(language=language, device=device)
        with open(model_pkl_path, "wb") as f:
            pickle.dump(model, f)
    else:
        with open(model_pkl_path, "rb") as f:
            start = time.time()
            print("loading pickled model from cache")
            model = pickle.load(f)
            print("loaded pickled model from cache, took ", time.time()-start)
    return model

Using pickle for TTS Model still does not help and takes approx 7 seconds for TTS for a short sentence.

Is there a way to improve the speed or further cache anything to reduce this cold start?

The gradio web UI takes approx 1 second to generate the same text. However, I would like to use the CLI instead of running a python server. Is there a way to optimise anything such that the CLI takes same time as the web UI/server?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant