exception:enum PyPreTokenizerTypeWrapper, while loading the fine-tuned model for evaluation #520

Prashant-Baliyan · 2024-05-07T06:08:27Z

Hi - currently we are fine-tuning the model: "paraphrase-multilingual-MiniLM-L12-v2" for our use case. In our pipeline, we have a model validation part where we are loading the trained model with:

model = SetFitModel.from_pretrained(model_dir)

but unfortunately, we are getting the below exception: -

Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 83 column 3.

Note: I am using amazon Sagemaker platform for finetuning with below configuration:

for traning:
instance_type: "ml.g5.2xlarge"
instance_count: 1
transformers_version: "4.28.1"
pytorch_version: "2.0.0"
setfit_version: "0.7.0"
py_version: "py310"

for validation:
instance_type: "ml.t3.xlarge"
instance_count: 1

it was working fine with the above configuration but since last couple of days we are getting the above-mentioned exception. So, it would be great if anyone can help us out to fix the issue.

Do let me know if any other information is required from our side.

PedroGarciasPainkillers · 2024-05-07T21:47:38Z

+1

jmatzat · 2024-05-08T12:16:43Z

I solved the error by updating the tokenizer and transformer libraries with pip -U

Prashant-Baliyan · 2024-05-08T13:51:47Z

@jmatzat - which version of tokenizer and transformer I need to go with?? If you can see above I am using below version of transformer:
transformers_version: "4.28.1"

and one more thing, where i have to use the pip -U command for updating the version, I mean during fine-tuning or during validation as we are having different instance for both.

jmatzat · 2024-05-08T14:00:07Z

I encountered the problem while loading the SetFit Model from pretrained.

tokenizers_version: 0.19.1

transformers_version: 4.40.2

You might have to update scikit-learn aswell, after updating tokenizer and transformer

Prashant-Baliyan · 2024-05-09T10:57:58Z

@jmatzat I tried to update the tokenizer and transformer version, but ended up with below error

transformers 4.30.2 requires tokenizers!=0.11.3,<0.14,>=0.11.1, but you have tokenizers 0.19.1 which is incompatible.
from setfit import SetFitModel
ImportError: tokenizers>=0.11.1,!=0.11.3,<0.14 is required for a normal functioning of this module, but found tokenizers==0.19.1.

jmatzat · 2024-05-09T11:02:44Z

Have you tried updating setFit aswell?

Updating Tokenizer and Transformer might require you to update other packages aswell, that depend on them.

Prashant-Baliyan · 2024-05-09T11:05:09Z

@jmatzat - yes, I tried with two Setfit version setfit==0.7.0 and 1.0.3. But if you can see transformer version itself not compatible with Tokenizer version first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exception:enum PyPreTokenizerTypeWrapper, while loading the fine-tuned model for evaluation #520

exception:enum PyPreTokenizerTypeWrapper, while loading the fine-tuned model for evaluation #520

Prashant-Baliyan commented May 7, 2024

PedroGarciasPainkillers commented May 7, 2024

jmatzat commented May 8, 2024

Prashant-Baliyan commented May 8, 2024

jmatzat commented May 8, 2024

Prashant-Baliyan commented May 9, 2024

jmatzat commented May 9, 2024

Prashant-Baliyan commented May 9, 2024 •

edited

exception:enum PyPreTokenizerTypeWrapper, while loading the fine-tuned model for evaluation #520

exception:enum PyPreTokenizerTypeWrapper, while loading the fine-tuned model for evaluation #520

Comments

Prashant-Baliyan commented May 7, 2024

PedroGarciasPainkillers commented May 7, 2024

jmatzat commented May 8, 2024

Prashant-Baliyan commented May 8, 2024

jmatzat commented May 8, 2024

Prashant-Baliyan commented May 9, 2024

jmatzat commented May 9, 2024

Prashant-Baliyan commented May 9, 2024 • edited

Prashant-Baliyan commented May 9, 2024 •

edited