-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE]: Cache in CLI #792
Comments
IMO it should be dynamically determined based on dataset size, i.e., any dataset fewer than (just a random number) 50k molecules should be cached unless a user tells us not to ( notes:
|
I completely second @davidegraff: maybe use 50k or 100k as the default limit. |
Can this be made available through the CLI? |
Not caching is the default (and only option currently) in the CLI, which works for all sizes of datasets. Soon we plan to add an option to cache for small datasets. I understand that your dataset is large. The CLI should work for your dataset as no caching is performed. |
Ok, do you want me to share a 10M public dataset w/ you so that you can reproduce the problem? |
~10M molecules; classification setting |
Yes, I can try running the CLI on it tomorrow and see if I can reproduce your error. Please send details of the dataset in issue #858. Thank you |
#697 added caching to v2. We haven't made it available through the CLI yet
The text was updated successfully, but these errors were encountered: