-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement: Provide option to modify cache folder for entity linker knowledge base downloads #415
Comments
I think you actually can do this, although admittedly I have not tried it. Can you try setting the scispacy/scispacy/file_cache.py Line 16 in 3d153dd
|
Makes sense. So it seems to pretty much be working with a bit of a workaround. The files are initially cached to After caching, move the cache folder to a permanent folder on Google drive: !mv /root/.scispacy/ /content/gdrive/MyDrive/test/
!ls /content/gdrive/MyDrive/test/.scispacy/
>>> datasets To update the environment variable, as described: import os
os.environ['SCISPACY_CACHE'] = '/content/gdrive/MyDrive/test/.scispacy/' However, this alone does not find the cached files. It will re-download the files again. In order to see the new environment variable, it's necessary to restart the runtime: Now when running the entity linker, it will see the permanently cached files. So is an enhancement necessary? It'd definitely be easier and more foolproof to simply add a parameter such as nlp.add_pipe(
"scispacy_linker",
config={
"resolve_abbreviations": True,
"linker_name": "umls",
"cache_folder": "/content/gdrive/MyDrive/test/"}) which would then be used to look for a subfolder |
scispacy/scispacy/file_cache.py
Line 16 in 2290a80
For Google Colab users, the
Path.home()
location is/root/
, which is deleted when the runtime is cleared. As runtimes are cleared fairly often, this means re-downloading the KBs. Perhaps there is a way to alterPath.home
frompathlib
? Another option is to allow the user to enter a cache folder, which Colab users could set to their Google Drive (fwiw just a regular folder as seen by python within Colab), thus making the download permanent.The text was updated successfully, but these errors were encountered: