-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
massive read only cache, missing something obvious? #1507
Comments
Have you considered using native multiprocessing.shared_memory ?
Joblib is mostly targeted for simpler use cases of embarassingly parallels jobs and requirements such as shared resources divert away from this initial goal. Though we could consider more generic apis if this kind of features become more and more requested. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have workers which require access to a several GBs sized read only cache to do various things. When I left the cache as a global variable, joblib was very slow, so I started loading them from pickle on each spawn.
This improved performance dramatically, but it's still loading the data on each spawn!
Admittedly it's probably getting it from OS level cached io in memory (so mostly skipping disk reads), but still it has to unpickle and some overhead accessing io is there. Also, memory usage is multiplied across workers.
Is there a way to just directly access a shared read only object without the serialization/deserialization?
I thought this would be a first class use case, and maybe it is so obvious that it doesn't get great documentation.
I tried passing around a memory object, but that didn't work, and the documentation doesn't mention this as a use case.
The text was updated successfully, but these errors were encountered: