Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformer unable to predict double phonemes #20

Open
Frank995 opened this issue Apr 28, 2022 · 4 comments
Open

Transformer unable to predict double phonemes #20

Frank995 opened this issue Apr 28, 2022 · 4 comments

Comments

@Frank995
Copy link

Frank995 commented Apr 28, 2022

Hello.
I found out a bug where the transformer model is unable to learn sequences of two or more consecutive identical phonemes. I first discovered it for italian which has double consonants and then applied it to english as well. Take the words holy and wholly as example. According to WordReference, their RP (probably outdated) pronunciation should be respectively: həʊli and həʊlli. I don't know how common is the latter with a geminated l sound but it doesn't really matter. What matters is that even with char repeats equal to 3 or 5 the transformer is unable to predict double phonemes.

It can be easily reproduced by running the run_training.py debug script with the default yaml file and this data:


train_data = [('en_us', 'holy', 'həʊli'),
              ('en_us', 'wholly', 'həʊlli')] * 50

val_data = [('en_us', 'holy', 'həʊli'),
            ('en_us', 'wholly', 'həʊlli')] * 60

config_file = 'forward_config.yaml'

preprocess(config_file=config_file,
            train_data=train_data,
            val_data=val_data,
            deduplicate_train_data=False)

train(config_file=config_file)

Even in a super overfitting environment you will see that predictions will be always həʊli. Reproduction rate 100%.

@Frank995 Frank995 changed the title Transformer unable to predict double consonants Transformer unable to predict double phonemes May 3, 2022
@Frank995
Copy link
Author

Frank995 commented May 3, 2022

Hi, I've updated the first post with reproduction code and new info. Hope it will get addressed. The most likely reason I thought to be the loss but I wasn't able to confirm it.

@cschaefer26
Copy link
Collaborator

Hi, thanks for mentioning the problem. I actually have seen that issue in the past but didn't have time to address it. The problem is not overfitting but the decoding the ctc output to phonemes, where a deduplication happens. One would need to replace the deduplication with a better method that allows phoneme duplicates. In case you are interested in messing with the code, here is the function:

def get_dedup_tokens(logits_batch: torch.Tensor) \

@Frank995
Copy link
Author

Frank995 commented May 6, 2022

Got it. Any particular reason for not using cross-entropy in the transformer too?

@cschaefer26
Copy link
Collaborator

Do you mean cross-attention?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants