Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File exists but it doesn't found it!!!!! #78

Open
puraminy opened this issue Jun 20, 2020 · 1 comment
Open

File exists but it doesn't found it!!!!! #78

puraminy opened this issue Jun 20, 2020 · 1 comment

Comments

@puraminy
Copy link

When I execute a python script via jupyter notebook I recieve the following error:


    ~/miniconda3/lib/python3.7/site-packages/fastai/text/data.py in train_sentencepiece(texts, path, pre_rules, post_rules, vocab_sz, max_vocab_sz, model_type, max_sentence_len, lang, char_coverage, tmp_dir, enc)
        434         f"--unk_id={len(defaults.text_spec_tok)} --pad_id=-1 --bos_id=-1 --eos_id=-1",
        435         f"--user_defined_symbols={','.join(spec_tokens)}",
    --> 436         f"--model_prefix={quotemark}{cache_dir/'spm'}{quotemark} --vocab_size={vocab_sz} --model_type={model_type}"]))
        437     raw_text_path.unlink()
        438     return cache_dir
    
    OSError: Not found: ""/home/pouramini/mf1/data/wiki/fa-2/models/fsp15k/all_text.out"": No such file or directory Error #2

However, the file exists! I wonder why it shows the path in two double quote?!

This is the code where the error raises, it looks for raw_text_path:

    raw_text_path = cache_dir + '/all_text.out'
            with open(raw_text_path, 'w', encoding=enc) as f: f.write("\n".join(texts))
            spec_tokens = ['\u2581'+s for s in defaults.text_spec_tok]
            SentencePieceTrainer.Train(" ".join([
                f'--input={raw_text_path} --max_sentence_length={max_sentence_len}',
                f"--character_coverage={ifnone(char_coverage, 0.99999 if lang in full_char_coverage_langs else 0.9998)}",
                f"--unk_id={len(defaults.text_spec_tok)} --pad_id=-1 --bos_id=-1 --eos_id=-1",
                f"--user_defined_symbols={','.join(spec_tokens)}",
                f"--model_prefix={cache_dir/'spm'} --vocab_size={vocab_sz} --model_type={model_type}"]))
            raw_text_path.unlink()

@puraminy
Copy link
Author

puraminy commented Jul 9, 2020

The problem is related to fastai or sentencepiece versions...
What happens instead is that 'tmp' folder is created along with files
named "cache_dir".vocab and "cache_dir".model inside my current directory.

For a solution you can refer to :

https://stackoverflow.com/questions/59788395/fastai-failed-initiation-of-language-model-in-sentence-piece-processor-cache?noredirect=1#comment110726963_59788395

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant