New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shims are too slow #2802
Comments
With hard-coded linking, dynamic version selection won't work -- everything would constantly point to the version that was selected the last time |
Why is dynamic version selection needed? The location of a given executable only changes when installing, removing, or selecting a version. My hook handles this as long as you rehash after changing versions, but that's not what I'm advocating for. Rather I want to move the version resolution from runtime to install/configure-time. I don't think it would be hard for pyenv to keep some symlinks in-sync with the selected versions, possibly as install/remove/global/local hooks. Shims can be kept around, but if any of this sounds appealing I'd like to write some proof of concept code to see how feasible this is. This is too inflexible even compared to my relink hook. Any change in version will require an alteration to PATH in every downstream process and old entries will be kept around until you surgically remove them or reboot. |
Just posting an example of where slow shims might be a problem. Launching the
Normally not a big deal, but it can slow thing down in programs like Byobu, that implement status widgets whose values are updated running small scripts on every interval, where that interval can be multiple times a second. For example, every status widget implemented here, which is all of them, imports this script, which launches Python for sanity checks. That order of magnitude increase of launch time is then compounded by the amount of status widgets. |
Also when changing directories or a certain envvar. |
We're all for optimizing the code. That just needs to be done without breaking it. Otherwise, users will report problems and we'll have to revert those "optimizations" back in order to fix them... |
I’m going to give a stab at profiling and implementing the hot-path functions in C using loadable bash builtins (it seems there is already some precedent for readlink in src/). I think we would probably want to make it optional, while leaving the original bash implementations available, no? Thoughts, suggestions, yays, nays? |
IMO a C substitute is acceptable but should be a last resort -- as we'll have to keep it in sync with the Bash code and, what is more important, it'll make it harder to diagnose users' problems since it won't be producing an execution trace. |
I agree that keeping things in sync could be a burden. That’s why I would opt to keep the original scripts around and have some sort of conformity/correctness test suite (perhaps strace log based?). Adding caching (knowing what to cache, where to put the results, how to do all proper checks/interlocks) feels like beyond my abilities given my current understanding of pyenv internals, but maybe after giving the C builtins a go I would be able to grasp the scope better. Anyways I did a bit of benchmarking.
Benchmark 3 vs 4 shows the difference between the top level use of Now to try using uftrace and perhaps adding some some SystemTap DTrace probes into bash for better visibility. |
Prerequisite
pyenv
and the defaultpython-build
plugin only. Please refrain from reporting issues of other plugins here.Description
Related to #1883, the files pyenv installs in
shims
are bash scripts that spawn over a hundred processes and take far too long to execute. Benchmarking with hyperfine shows the shim alone takes ~250ms on my machine, and running with BASH_ENV set doubles that, even though my BASH_ENV only takes ~16ms to run (probably because of so many bash processes being spawned). IMO this is an unbearable amount of time for a language interpreter to start up, especially in a tight loop.This problem can be mitigated by porting pyenv to POSIX shell so dash can be used (I can contribute here if desired), but /bin/sh won't always be dash. Instead, I've bypassed the problem on my local machine by adding a
pyenv rehash
hook that goes through every shim, runspyenv which
for one, then symlinks each one tolinks
. Since bash isn't involved, python now starts up nearly instantly.I want to implement this feature in pyenv so downstream projects can benefit from it too. I would argue shims should be replaced by symlinks, but they could be used as a fallback if symlinks aren't available on all platforms pyenv supports. I don't have intimate knowledge of pyenv's internals, so help on where to start would be appreciated.
hyperfine benchmark command
pyenv.d/rehash/relink.bash
The text was updated successfully, but these errors were encountered: