Emphasis on syllables – How to choose? #53

maniupo · 2023-07-03T08:29:03Z

Hi there,

during the last days I've been trying out the Thorsten-voice in a python virtual environment setup, as described in the German language video Freie Thorsten Stimme in LINUX lokal nutzen Text-to-Speech TTS Tutorial.

I'm amazed by the very naturally sounding voice quality. Only in some words I found the emphasis put on syllables that, in spoken German language, don't usually receive it there.

In some test phrase there was, for example, the originally English derived word "Marketing", which now got stressed on the second syllable.

Now I wondered, whether there might be any way to instruct the tts program or tts-server to put the emphasis on the first syllable.

On my web search I came across a question where the original poster said:

I know that some voice engines use special characters like + or 'in front of a stressed vowel.

I tried this suggestion several times (mainly referring to syllables, though), with different methods:

directly by executing following commands:
tts --text "Marketing." --model_name tts_models/de/thorsten/vits --out_path marketing1.wav
tts --text "+Marketing." --model_name tts_models/de/thorsten/vits --out_path marketing2.wav
tts --text "'Marketing." --model_name tts_models/de/thorsten/vits --out_path marketing3.wav
Starting a server
tts-server --model_name tts_models/de/thorsten/vits

and subsequently using:

a) the browser at localhost:5002/, inserting the strings
"+Marketing." (saved as marketing4.wav) and
"'Marketing." (saved as marketing5.wav).

b) curl:
curl -o marketing6.wav http://localhost:5002/api/tts?text=+Marketing.
curl -o marketing7.wav http://localhost:5002/api/tts?text=\'Marketing.

c) cTTS (Python3):
import cTTS
cTTS.synthesizeToFile("marketing8.wav", "+Marketing.")
cTTS.synthesizeToFile("marketing9.wav", "'Marketing.")

You can find the resulting sound files attached, packed in a zip file.

To my ears, there is not really much difference in them, though. The emphasis seems to rest mainly on the second syllable.

Now I'm wondering, what else I might be able to try. In case you have got any ideas or suggestions I would greatly appreciate getting to know.

Maybe I should mention, I am only doing some first steps into programming. As to my system, I am working on an up-to-date linux system (a derivative of Debian 11, without systemd). It's an older machine, though. That's probably why, at the moment, I can only use the vits model.

Thanks in advance

marketing_wav.zip

The text was updated successfully, but these errors were encountered:

thorstenMueller · 2023-07-05T17:09:37Z

Hi and thanks for your feedback 😊,
maybe adding a (+) just works on phoneme and not on text input. But luckily my next released video will be about adjusting pronounciation. I can post the video link here once it's released.

thorstenMueller · 2023-07-09T07:18:03Z

I've released a tutorial video on my "Thorsten-Voice" Youtube channel showing how to fix TTS pronunciation issues by adjusting eSpeak(-ng) dictionary.
https://youtu.be/493xbPIQBSU

Hope this helps you.

maniupo · 2023-07-11T18:34:33Z

Great, thanks! I learned – and laughed – a lot while going through that video. It's really impressive, how the pronunciation of a word can change…

Concerning my system-wide installed espeak-ng, everything went well, like you showed it. So, I could also make some temporary, funny changes to the pronunciation of German language words.

Now, the example I was working on – "Marketing" –, didn't need any change there. In my system-wide installed espeak-ng, it already has its emphasis on the right syllable(s) (IPA: mˈaɾkeːtˌɪŋ), though brought out in quite a metallic voice.

Later I also found out, that making changes here would not lead to any differences in my virtual environment Thorsten voice installation, though. Obviously, it uses its own, different phonemizing application.

Trying to judge by installed lexicon databases, now I think it's an application called Gruut, doing this job there. In the lexicon database –/opt/tts/lib/python3.9/site-packages/gruut_lang_de/lexicon.db there is an entry:

INSERT INTO word_phonemes VALUES(151977,'marketing',0,'m a ʁ k eː t ɪ ŋ','');

So here, the pronounciation is coded as "m a ʁ k eː t ɪ ŋ", while espeak generally codes it as "mˈaɾkeːtˌɪŋ".

espeak-ng -vde "Marketing" --ipa
mˈaɾkeːtˌɪŋ

Now, how to find a way to adapt the gruut database? In the activated virtual environment, I tried to use its help function
gruut --help

On pypi.org there is a recommendation to use
python3 -m gruut <LANGUAGE> --help
which led to a similar output.

As I found out, there is an option to choose an included espeak database:
--espeak Use eSpeak versions of lexicons (overrides --model-prefix)

Now, there, in the file –/opt/tts/lib/python3.9/site-packages/gruut_lang_de/espeak/lexicon.db the entry looks quite similar, though it does seem to carry some stress on the first syllable:

INSERT INTO word_phonemes VALUES(152622,'marketing',0,'m ˈa ɾ k eː t ˌɪ ŋ','');

In a hope of temporarily switching to this lexicon database in the tts virtual environment, I tried the command
gruut --espeak
which led to the result
Reading input from stdin...
without the process coming to an end by itself. So obviously, it would need some more input (maybe in combination with a tts command?).

But maybe choosing between two databases would lead away from the more helpful goal of adapting the pronunciation of single words. At the moment, I have no clue, how it might be possible to edit or change the lexicon databases.

Talking about phonemes, I found out, there is gruut-ipa, a

Library for manipulating International Phonetic Alphabet (IPA) pronunciations.

When I issued the command
pip install gruut-ipa
as suggested on pypi.org, I found out, it was actually already installed in the virtual tts environment.

So, I tried to find out, what it does:

$ gruut-ipa --help
usage: gruut_ipa [-h] {print,describe,phones,phonemes,convert} ...

positional arguments:
  {print,describe,phones,phonemes,convert}
    print               Print all known IPA phones
    describe            Describe IPA phone(s)
    phones              Group phones in IPA pronunciation
    phonemes            Group phones in IPA pronunciation according to language phonemes
    convert             Convert pronunciations between ipa, espeak, and sampa

optional arguments:
  -h, --help            show this help message and exit

That's, how far I got today. Maybe some other day will show, whether or how it might be able to help in adapting the pronunciation of some words.

Generally, if my posts come seldom or late, by the way, this is partially due to a two-factor authentication required, which makes logging in somewhat more difficult for me here.

thorstenMueller · 2023-07-12T17:49:51Z

I'm happy you found my recent video on that helpful (and a little bit funny) 😆. As i'm not the greatest "gruut" expert let me tag @synesthesiam (Michael Hansen) as he's the mastermind behind gruut and an phonetic expert.

synesthesiam · 2023-07-19T15:35:11Z

@maniupo This sounds like a good situation for SSML, specifically the <phoneme> element. gruut has some limited support for this with the --ssml flag, but I don't think Coqui TTS has made use of it yet 🙁

thorstenMueller · 2023-07-20T19:38:41Z

but I don't think Coqui TTS has made use of it yet

I guess you're right @synesthesiam . AFAIK Coqui TTS doesn't support SSML yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emphasis on syllables – How to choose? #53

Emphasis on syllables – How to choose? #53

maniupo commented Jul 3, 2023

thorstenMueller commented Jul 5, 2023

thorstenMueller commented Jul 9, 2023

maniupo commented Jul 11, 2023

thorstenMueller commented Jul 12, 2023

synesthesiam commented Jul 19, 2023

thorstenMueller commented Jul 20, 2023

Emphasis on syllables – How to choose? #53

Emphasis on syllables – How to choose? #53

Comments

maniupo commented Jul 3, 2023

thorstenMueller commented Jul 5, 2023

thorstenMueller commented Jul 9, 2023

maniupo commented Jul 11, 2023

thorstenMueller commented Jul 12, 2023

synesthesiam commented Jul 19, 2023

thorstenMueller commented Jul 20, 2023