-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic iwslt config train failure due to directory errors #214
Comments
comment added to the script joeynmt/scripts/get_iwslt14_bpe.sh Lines 6 to 15 in d6247c5
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
When trying to run one of the iwslt example config training runs, I repeatedly got errors due to the BPE code files not being properly moved to the model directory.
The fix is to ensure that the correct tokenizer data directory structure is present before training begins. The relevant line is
joeynmt/joeynmt/tokenizers.py
Lines 309 to 311 in 0968187
which I replaced with:
to fix my specific case. It probably needs to be added elsewhere though (at least to the other tokenizer classes). Note: I removed the
as_posix()
for another reason during testing, but that is not relevant to this bug.I was able to reproduce this bug as well as the fix on two different machines. I am happy to contribute the patch, if this is truly a bug and I am not missing something simple.
To Reproduce
Steps to reproduce the behavior:
scripts/get_iwslt14_bpe.sh
scriptconfigs/iwslt14_deen_bpe.yaml
)python scripts/build_vocab.py configs/iwslt14_deen_bpe.yaml
python3 -m joeynmt train configs/iwslt14_deen_bpe.yaml
Logged output
Relevant log, showing that the BPE code files were not properly copied over:
Expected behavior
All tokenizer information copied over, and training running as normal.
System (please complete the following information):
Config file:
The text was updated successfully, but these errors were encountered: