Transformer unable to predict double phonemes #20

Frank995 · 2022-04-28T11:24:39Z

Hello.
I found out a bug where the transformer model is unable to learn sequences of two or more consecutive identical phonemes. I first discovered it for italian which has double consonants and then applied it to english as well. Take the words holy and wholly as example. According to WordReference, their RP (probably outdated) pronunciation should be respectively: həʊli and həʊlli. I don't know how common is the latter with a geminated l sound but it doesn't really matter. What matters is that even with char repeats equal to 3 or 5 the transformer is unable to predict double phonemes.

It can be easily reproduced by running the run_training.py debug script with the default yaml file and this data:


train_data = [('en_us', 'holy', 'həʊli'),
              ('en_us', 'wholly', 'həʊlli')] * 50

val_data = [('en_us', 'holy', 'həʊli'),
            ('en_us', 'wholly', 'həʊlli')] * 60

config_file = 'forward_config.yaml'

preprocess(config_file=config_file,
            train_data=train_data,
            val_data=val_data,
            deduplicate_train_data=False)

train(config_file=config_file)

Even in a super overfitting environment you will see that predictions will be always həʊli. Reproduction rate 100%.

The text was updated successfully, but these errors were encountered:

Frank995 · 2022-05-03T10:25:07Z

Hi, I've updated the first post with reproduction code and new info. Hope it will get addressed. The most likely reason I thought to be the loss but I wasn't able to confirm it.

cschaefer26 · 2022-05-04T13:38:45Z

Hi, thanks for mentioning the problem. I actually have seen that issue in the past but didn't have time to address it. The problem is not overfitting but the decoding the ctc output to phonemes, where a deduplication happens. One would need to replace the deduplication with a better method that allows phoneme duplicates. In case you are interested in messing with the code, here is the function:

DeepPhonemizer/dp/model/utils.py

Line 38 in b8f1707

def get_dedup_tokens(logits_batch: torch.Tensor) \

Frank995 · 2022-05-06T10:23:21Z

Got it. Any particular reason for not using cross-entropy in the transformer too?

cschaefer26 · 2022-06-10T10:26:35Z

Do you mean cross-attention?

Frank995 changed the title ~~Transformer unable to predict double consonants~~ Transformer unable to predict double phonemes May 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformer unable to predict double phonemes #20

Transformer unable to predict double phonemes #20

Frank995 commented Apr 28, 2022 •

edited

Frank995 commented May 3, 2022

cschaefer26 commented May 4, 2022

Frank995 commented May 6, 2022 •

edited

cschaefer26 commented Jun 10, 2022

Transformer unable to predict double phonemes #20

Transformer unable to predict double phonemes #20

Comments

Frank995 commented Apr 28, 2022 • edited

Frank995 commented May 3, 2022

cschaefer26 commented May 4, 2022

Frank995 commented May 6, 2022 • edited

cschaefer26 commented Jun 10, 2022

Frank995 commented Apr 28, 2022 •

edited

Frank995 commented May 6, 2022 •

edited