Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NLTK Data Delivery #152

Open
gaetan-dion opened this issue Mar 16, 2021 · 0 comments
Open

NLTK Data Delivery #152

gaetan-dion opened this issue Mar 16, 2021 · 0 comments

Comments

@gaetan-dion
Copy link

gaetan-dion commented Mar 16, 2021

Hi,

I have a question on how to make the modules availables (pretrained models, dictionnary...)
Today we can get these modules directly by the URL :

Or in using nltk.download(), which use downloader.py class.
This way works if you have an internet access or a proxy available : nltk.set_proxy('xxx')
However, most of time in big compagny, you don't have these access, the environment can be too sensitive, and you have to guarantee compatibility between several librairies and modules, etc.

Today, data science has plenty of librairies, and it's difficult to maintain compatibility between them, to do that, librairie like spaCy use conda repository and conda package manager.
Example : https://github.com/explosion/spacy-models/releases/ =>https://anaconda.org/conda-forge/spacy-model-de_core_news_sm

Thanks to conda, we have a real delivery in a repository and artefact management, so we can get the different modules by a package manager, which resolve dependencies compatiblities. We can manage them by a repository, and not just by a curl.

Do you know if something is planned for NLTK Data, to manage a delivery for NLTK modules?

@gaetan-dion gaetan-dion changed the title NLTK Data Delivey NLTK Data Delivery Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant