Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run examples / pytest tests: #54

Open
repodiac opened this issue Nov 5, 2019 · 1 comment
Open

Cannot run examples / pytest tests: #54

repodiac opened this issue Nov 5, 2019 · 1 comment

Comments

@repodiac
Copy link

repodiac commented Nov 5, 2019

I cannot make MultiFiT to work in my environment :-(

What I did was...

  • I checked out the repo and ran any "prepare..." script available.
  • I had to "pip install" the modules "fire" and "sacremoses" since they neither were available via the code nor the "fastai" package (I installed the most recent version 1.0.59)
  • I started pytest . or training according to the example python -m ulmfit lm --dataset-path data/wiki/${LANG}-100 --tokenizer='f' --nl 3 --name 'orig' --max-vocab 60000 \ --lang ${LANG} --qrnn=False - train 10 --bs=50 --drop_mult=0 --label-smoothing-eps=0.0

RESULT: I always get an UnicodeDecodeError

e.g. with the training command:

Max vocab: 60000
Cache dir: data/wiki/en-100/models/f60k
Model dir: data/wiki/en-100/models/f60k/lstm_orig.m
Wiki text was split to 28476 articles
Wiki text was split to 60 articles
Running tokenization lm...
Traceback (most recent call last): File "/home/user/miniconda/envs/py36/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/user/miniconda/envs/py36/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/app/work/ulmfit/__main__.py", line 188, in <module> fire.Fire(ULMFiT()) File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fire/core.py", line 138, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fire/core.py", line 471, in _Fire target=component.__name__) File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fire/core.py", line 675, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/app/work/ulmfit/pretrain_lm.py", line 164, in train_lm data_lm = self.load_wiki_data(bs=bs) if data_lm is None else data_lm File "/app/work/ulmfit/pretrain_lm.py", line 246, in load_wiki_data **args) File "/app/work/ulmfit/pretrain_lm.py", line 254, in lm_databunch return self.databunch(name, bunch_class=TextLMDataBunch, *args, **kwargs) File "/app/work/ulmfit/pretrain_lm.py", line 279, in databunch **args) File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastai/text/data.py", line 202, in from_df if cls==TextLMDataBunch: src = src.label_for_lm() File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastai/data_block.py", line 480, in _inner self.process() File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastai/data_block.py", line 534, in process for ds,n in zip(self.lists, ['train','valid','test']): ds.process(xp, yp, name=n) File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastai/data_block.py", line 714, in process self.x.process(xp) File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastai/data_block.py", line 84, in process for p in self.processor: p.process(self) File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastai/text/data.py", line 296, in process for i in progress_bar(range(0,len(ds),self.chunksize), leave=False): File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastprogress/fastprogress.py", line 75, in __iter__ if self.auto_update: self.update(i+1) File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastprogress/fastprogress.py", line 92, in update self.update_bar(val) File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastprogress/fastprogress.py", line 104, in update_bar else: self.on_update(val, f'{100 * val/self.total:.2f}% [{val}/{self.total} {elapsed_t}<{remaining_t}{end}]')
File "/home/user/miniconda/envs/py36/lib/python3.6/site-packages/fastprogress/fastprogress.py", line 274, in on_update
if printing(): WRITER_FN(to_write, end = '\r')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-35: ordinal not in range(128)

or with the tests (any!):

self = <encodings.ascii.IncrementalDecoder object at 0x7fdcdf958e10>
input = b' \n = Valkyria Chronicles III = \n \n Senj\xc5\x8d no Valkyria 3 : <unk> Chronicles ( Japanese : \xe6\x88\xa6\xe5\xa...n force invading the Empire just following the two nations \' cease @-@ fire would certainly wreck their newfound peac'
final = False

def decode(self, input, final=False):
> return codecs.ascii_decode(input, self.errors)[0]
E UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 39: ordinal not in range(128)

/home/user/miniconda/envs/py36/lib/python3.6/encodings/ascii.py:26: UnicodeDecodeError

Does anyone have a clue here? Thanks a lot in advance!

@PiotrCzapla
Copy link
Member

Hi @repodiac it seems you are using the older scripts . python -m ulmfit lm doesn't look like the new framework. You might want to try either run the current framework and see if the issue is solved there.
I think I've seen " 'ascii' codec can't encode characters" before, when loading tokenized datasets. The solution was to remove the cache files made by older fastai and recreate them. The new framework does this automatically.

Let me know if that helped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants