-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NLTK Data Delivery #152
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
I have a question on how to make the modules availables (pretrained models, dictionnary...)
Today we can get these modules directly by the URL :
Or in using
nltk.download()
, which usedownloader.py
class.This way works if you have an internet access or a proxy available :
nltk.set_proxy('xxx')
However, most of time in big compagny, you don't have these access, the environment can be too sensitive, and you have to guarantee compatibility between several librairies and modules, etc.
Today, data science has plenty of librairies, and it's difficult to maintain compatibility between them, to do that, librairie like spaCy use conda repository and conda package manager.
Example : https://github.com/explosion/spacy-models/releases/ =>https://anaconda.org/conda-forge/spacy-model-de_core_news_sm
Thanks to conda, we have a real delivery in a repository and artefact management, so we can get the different modules by a package manager, which resolve dependencies compatiblities. We can manage them by a repository, and not just by a curl.
Do you know if something is planned for NLTK Data, to manage a delivery for NLTK modules?
The text was updated successfully, but these errors were encountered: