Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in "pgml.transform" with "text2text-generation" and "bigscience/T0" #1349

Open
remote4me opened this issue Mar 3, 2024 · 2 comments
Open

Comments

@remote4me
Copy link

remote4me commented Mar 3, 2024

Environment: Ubuntu 22.04, Self-hosted Postgresml installed in Postgresql 13 database

I am running this SQL:

SELECT pgml.transform(
    task => '{
        "task" : "text2text-generation",
        "model" : "bigscience/T0"
    }'::JSONB,
    inputs => ARRAY[
        'Is the word ''table'' used in the same meaning in the two previous sentences? Sentence A: you can leave the books on the table over there. Sentence B: the tables in this book are very hard to read.'

    ]
) AS answer;

After the first attempt... it took more than 30 minutes... after this I got an error message.
Then I restarted the postgresql service... still this error... then I rebooted the machine... same thing.

Here the error:

An error occurred when executing the SQL command:
SELECT pgml.transform(
    task => '{
        "task" : "text2text-generation",
        "model" : "bigscience/T0"
    }'::JSONB,
    inputs => ARRAY[
 ...

ERROR: Traceback (most recent call last):
  File "transformers.py", line 449, in transform
  File "transformers.py", line 418, in create_pipeline
  File "transformers.py", line 306, in __init__
  File "/var/lib/postgresml-python/pgml-venv/lib/python3.10/site-packages/transformers/pipelines/__init__.py", line 905, in pipeline
    framework, model = infer_framework_load_model(
  File "/var/lib/postgresml-python/pgml-venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 292, in infer_framework_load_model
    raise ValueError(
 ValueError: Could not load model bigscience/T0 with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM'>, <class 'transformers.models.t5.modeling_t5.T5ForConditionalGeneration'>). See the original errors:

while loading with AutoModelForSeq2SeqLM, an error is thrown:
Traceback (most recent call last):
  File "/var/lib/postgresml-python/pgml-venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 530, in load_state_dict
    return torch.load(
  File "/var/lib/postgresml-python/pgml-venv/lib/python3.10/site-packages/torch/serialization.py", line 1004, in load
    overall_storage = torch.UntypedStorage.from_file(f, False, size)
RuntimeError: unable to mmap 44541580809 bytes from file </var/lib/postgresql/.cache/huggingface/hub/models--bigscience--T0/snapshots/7920e3b4fd0027e20824cec6d1daea6130723fec/pytorch_model.bin>: Cannot allocate memory (12)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/postgresml-python/pgml-venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 539, in load_state_dict
    if f.read(7) == "version":
  File "/usr/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/postgresml-python/pgml-venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 279, in infer_framework_load_model
    model = model_class.from_pretrained(model, **kwargs)
  File "/var/lib/postgresml-python/pgml-venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
    return model_class.from_pretrained(
  File "/var/lib/postgresml-python/pgml-venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3306, in from_pretrained
    state_dict = load_state_dict(resolved_archive_file)
  File "/var/lib/postgresml-python/pgml-venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 551, in load_state_dict
    raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for '/var/lib/postgresql/.cache/huggingface/hub/models--bigscience--T0/snapshots/7920e3b4fd0027e20824cec6d1daea6130723fec/pytorch_model.bin' at '/var/lib/postgresql/.cache/huggingface/hub/models--bigscience--T0/snapshots/7920e3b4fd0027e20824cec6d1daea6130723fec/pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

while loading with T5ForConditionalGeneration, an error is thrown:
Traceback (most recent call last):
  File "/var/lib/postgresml-python/pgml-venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 530, in load_state_dict
    return torch.load(
  File "/var/lib/postgresml-python/pgml-venv/lib/python3.10/site-packages/torch/serialization.py", line 1004, in load
    overall_storage = torch.UntypedStorage.from_file(f, False, size)
RuntimeError: unable to mmap 44541580809 bytes from file </var/lib/postgresql/.cache/huggingface/hub/models--bigscience--T0/snapshots/7920e3b4fd0027e20824cec6d1daea6130723fec/pytorch_model.bin>: Cannot allocate memory (12)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/postgresml-python/pgml-venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 539, in load_state_dict
    if f.read(7) == "version":
  File "/usr/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/postgresml-python/pgml-venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 279, in infer_framework_load_model
    model = model_class.from_pretrained(model, **kwargs)
  File "/var/lib/postgresml-python/pgml-venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3306, in from_pretrained
    state_dict = load_state_dict(resolved_archive_file)
  File "/var/lib/postgresml-python/pgml-venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 551, in load_state_dict
    raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for '/var/lib/postgresql/.cache/huggingface/hub/models--bigscience--T0/snapshots/7920e3b4fd0027e20824cec6d1daea6130723fec/pytorch_model.bin' at '/var/lib/postgresql/.cache/huggingface/hub/models--bigscience--T0/snapshots/7920e3b4fd0027e20824cec6d1daea6130723fec/pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.


1 statement failed.

Execution time: 5.4s

@remote4me
Copy link
Author

I see two things there:
"Cannot allocate memory" and
"Unable to load weights from pytorch checkpoint file"

There is 16 GB RAM in this machine and 4 GB RAM in Nvidia GPU

nvidia-smi
Sun Mar  3 22:35:06 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro M1200                   On  | 00000000:01:00.0  On |                  N/A |
| N/A   47C    P0              N/A / 200W |    473MiB /  4096MiB |      3%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

@remote4me
Copy link
Author

Wait, 44541580809 bytes = 44.5 GB
Well, do not know how it fits into 16 GB RAM...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant