Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: Can't get attribute 'tokenizer' on <module '__main__'> #81

Open
NielsHoogeveen1990 opened this issue Oct 12, 2018 · 6 comments
Labels

Comments

@NielsHoogeveen1990
Copy link

NielsHoogeveen1990 commented Oct 12, 2018

Hi,

I am trying to test the pickled objects, to verify that I can import the vectorizer and unpickle the classifier.

import re
import os
from vectorizer import vect

clf = pickle.load(open(os.path.join('pkl_objects', 'classifier.pkl'), 'rb'))

However, I get this error:

----> 6 clf = pickle.load(open(os.path.join('pkl_objects', 'classifier.pkl'), 'rb'))

AttributeError: Can't get attribute 'tokenizer' on <module 'main'>

What is going on and how can I fix this error?

Thank you!

@rasbt
Copy link
Owner

rasbt commented Oct 12, 2018

Hm, this looks like some namespace issue. Pickle is very sensitive about that. Do you have the tokenizer defined in the vectorizer.py file, like so?

   def tokenizer(text):
       text = re.sub('<[^>]*>', '', text)
       emoticons = re.findall('(?::|;|=)(?:-)?(?:\)|\(|D|P)',
                              text.lower())
       text = re.sub('[\W]+', ' ', text.lower()) \
                     + ' '.join(emoticons).replace('-', '')
       tokenized = [w for w in text.split() if w not in stop]
       return tokenized


   vect = HashingVectorizer(decode_error='ignore',
                            n_features=2**21,
                            preprocessor=None,
                            tokenizer=tokenizer)

@rasbt rasbt added the question label Jun 16, 2019
@raybellwaves
Copy link

Also have this issue. Can confirm vectorizer.py looks like that.

@raybellwaves
Copy link

raybellwaves commented Jul 21, 2019

For what it's worth, doing:

>>> import pickle
>>> import re
>>> import os
>>> from vectorizer import *
>>> clf = pickle.load(open(os.path.join('pkl_objects', 'classifier.pkl'), 'rb'))
>>> clf

Seemed to work for me. Thanks to digging around here https://stackoverflow.com/questions/40287657/load-pickled-object-in-different-file-attribute-error#comment67835396_40287657

@raybellwaves
Copy link

Ah think I know the source of my issue. I was pickling the clf after the logistic regression model. In the textbook it has the pickling after the SGDCClassifier.

@rasbt
Copy link
Owner

rasbt commented Jul 21, 2019

Thanks for the comments. Hm, that's weird, the SGDClassifier should behave exactly the same as the LogisticRegression classifier when it comes to pickling. I think in most cases, it's usually a namespace issue. Hope you were able to resolve it.

@raybellwaves
Copy link

Also I realized I picked the Pipeline object by pickling the earlier classifier...

clf = gs_lr_tfidf.best_estimator_

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants