Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exception:enum PyPreTokenizerTypeWrapper, while loading the fine-tuned model for evaluation #520

Open
Prashant-Baliyan opened this issue May 7, 2024 · 7 comments

Comments

@Prashant-Baliyan
Copy link

Hi - currently we are fine-tuning the model: "paraphrase-multilingual-MiniLM-L12-v2" for our use case. In our pipeline, we have a model validation part where we are loading the trained model with:

model = SetFitModel.from_pretrained(model_dir)

but unfortunately, we are getting the below exception: -

Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 83 column 3.

Note: I am using amazon Sagemaker platform for finetuning with below configuration:

for traning:
instance_type: "ml.g5.2xlarge"
instance_count: 1
transformers_version: "4.28.1"
pytorch_version: "2.0.0"
setfit_version: "0.7.0"
py_version: "py310"

for validation:
instance_type: "ml.t3.xlarge"
instance_count: 1

it was working fine with the above configuration but since last couple of days we are getting the above-mentioned exception. So, it would be great if anyone can help us out to fix the issue.

Do let me know if any other information is required from our side.

@PedroGarciasPainkillers
Copy link

+1

@jmatzat
Copy link

jmatzat commented May 8, 2024

I solved the error by updating the tokenizer and transformer libraries with pip -U

@Prashant-Baliyan
Copy link
Author

@jmatzat - which version of tokenizer and transformer I need to go with?? If you can see above I am using below version of transformer:
transformers_version: "4.28.1"

and one more thing, where i have to use the pip -U command for updating the version, I mean during fine-tuning or during validation as we are having different instance for both.

@jmatzat
Copy link

jmatzat commented May 8, 2024

I encountered the problem while loading the SetFit Model from pretrained.

tokenizers_version: 0.19.1

transformers_version: 4.40.2

You might have to update scikit-learn aswell, after updating tokenizer and transformer

@Prashant-Baliyan
Copy link
Author

@jmatzat I tried to update the tokenizer and transformer version, but ended up with below error

  • transformers 4.30.2 requires tokenizers!=0.11.3,<0.14,>=0.11.1, but you have tokenizers 0.19.1 which is incompatible.

  • from setfit import SetFitModel
    ImportError: tokenizers>=0.11.1,!=0.11.3,<0.14 is required for a normal functioning of this module, but found tokenizers==0.19.1.

@jmatzat
Copy link

jmatzat commented May 9, 2024

Have you tried updating setFit aswell?

Updating Tokenizer and Transformer might require you to update other packages aswell, that depend on them.

@Prashant-Baliyan
Copy link
Author

Prashant-Baliyan commented May 9, 2024

@jmatzat - yes, I tried with two Setfit version setfit==0.7.0 and 1.0.3. But if you can see transformer version itself not compatible with Tokenizer version first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants