Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pkg_resources is not defined in python docker #10265

Closed
neman-je opened this issue May 10, 2024 · 4 comments
Closed

Pkg_resources is not defined in python docker #10265

neman-je opened this issue May 10, 2024 · 4 comments

Comments

@neman-je
Copy link

Hi all,

for some time now, I am maintaining a service developed as a python app by my colleagues. I have a docker image that’s been working perfectly until recently when they introduced the dependency on XGBoost. Since then, we are getting an exception in the python script with an error: “pkg_resources is not defined”. Here is the relevant dockerfile content:

FROM python:3.10.5 as python-base

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

FROM [mcr.microsoft.com/dotnet/aspnet:6.0](http://mcr.microsoft.com/dotnet/aspnet:6.0) AS base

COPY --from=python-base /usr/local /usr/local

ENV LD_LIBRARY_PATH /usr/local/lib

WORKDIR /app
EXPOSE 80
EXPOSE 443

And here is my requirements file:

pandas==1.4.3
torch==1.12.0
pytorch_tabnet==3.1.1
pickle-mixin==1.0.2
pygrnn==0.1.2
xgboost==1.6.1

During the attempts to fix it I also got error “ImportError: libexpat.so.1: cannot open shared object file: No such file or directory”

The line where it crashes:

nw = pickle.load(open( model_file, ‘rb’))
I’ve tried installing different additional packages, replicating their local development environments, adding C++ redistributable packages (saw it in XGBoost documentation) and no luck.

Any idea what is my docker image missing?

@trivialfis
Copy link
Member

trivialfis commented May 10, 2024

A pickled object is tied to the environment that produces it, including Python versions (you can run into invalid Python byte code when loading a Python object from a different Python version), XGBoost versions, and potentially all the dependencies that are loaded into the environment. It's basically a raw serialization.

Use save_model from XGBoost to export the model if you want to reuse it in a different environment. See https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html#introduction-to-model-io

@neman-je
Copy link
Author

Hello @trivialfis

Thanks for commenting. I instructed my team to read this and they have sent me the updated implementation. Now, at the beginning of our python script, we have

import xgboost

and already that is throwing an error:

2024-05-14 10:30:18 Traceback (most recent call last):
2024-05-14 10:30:18   File "/usr/local/lib/python3.10/site-packages/xgboost/compat.py", line 105, in <module>
2024-05-14 10:30:18     import pkg_resources
2024-05-14 10:30:18   File "/usr/local/lib/python3.10/site-packages/pkg_resources/__init__.py", line 32, in <module>
2024-05-14 10:30:18     import plistlib
2024-05-14 10:30:18   File "/usr/local/lib/python3.10/plistlib.py", line 61, in <module>
2024-05-14 10:30:18     from xml.parsers.expat import ParserCreate
2024-05-14 10:30:18   File "/usr/local/lib/python3.10/xml/parsers/expat.py", line 4, in <module>
2024-05-14 10:30:18     from pyexpat import *
2024-05-14 10:30:18 ImportError: libexpat.so.1: cannot open shared object file: No such file or directory
2024-05-14 10:30:18 
2024-05-14 10:30:18 During handling of the above exception, another exception occurred:
2024-05-14 10:30:18 
2024-05-14 10:30:18 Traceback (most recent call last):
2024-05-14 10:30:18   File "/app/Logic/Implementations/5/XGBoost_deploy_json.py", line 5, in <module>
2024-05-14 10:30:18     import xgboost as xgb
2024-05-14 10:30:18   File "/usr/local/lib/python3.10/site-packages/xgboost/__init__.py", line 9, in <module>
2024-05-14 10:30:18     from .core import DMatrix, DeviceQuantileDMatrix, Booster, DataIter, build_info
2024-05-14 10:30:18   File "/usr/local/lib/python3.10/site-packages/xgboost/core.py", line 20, in <module>
2024-05-14 10:30:18     from .compat import STRING_TYPES, DataFrame, py_str, PANDAS_INSTALLED
2024-05-14 10:30:18   File "/usr/local/lib/python3.10/site-packages/xgboost/compat.py", line 108, in <module>
2024-05-14 10:30:18     except pkg_resources.DistributionNotFound:
2024-05-14 10:30:18 NameError: name 'pkg_resources' is not defined

Is there any documentation which lists requirements for running XGBoost inside a docker container?

@trivialfis
Copy link
Member

Thank you for the update. I looked again, it appears to be an issue with pkg_resources. The use of it was removed in xgboost 1.7 and all the later versions. You may try the latest xgboost instead.

@neman-je
Copy link
Author

Updating to 1.7.1 solved the problem.

I spent days trying to resolve this, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants