Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shims are too slow #2802

Open
5 tasks done
mrnoname1000 opened this issue Sep 29, 2023 · 8 comments
Open
5 tasks done

Shims are too slow #2802

mrnoname1000 opened this issue Sep 29, 2023 · 8 comments

Comments

@mrnoname1000
Copy link

mrnoname1000 commented Sep 29, 2023

Prerequisite

  • Make sure your problem is not listed in the common build problems.
  • Make sure no duplicated issue has already been reported in the pyenv issues. You should look for closed issues, too.
  • Make sure you are not asking us to help solving your specific issue.
    • GitHub issues is opened mainly for development purposes. If you want to ask someone to help solving your problem, go to some community site like Gitter, StackOverflow, etc.
  • Make sure your problem is not derived from packaging (e.g. Homebrew).
    • Please refer to the package documentation for the installation issues, etc.
  • Make sure your problem is not derived from plugins.
    • This repository is maintaining pyenv and the default python-build plugin only. Please refrain from reporting issues of other plugins here.

Description

Related to #1883‌, the files pyenv installs in shims are bash scripts that spawn over a hundred processes and take far too long to execute. Benchmarking with hyperfine shows the shim alone takes ~250ms on my machine, and running with BASH_ENV set doubles that, even though my BASH_ENV only takes ~16ms to run (probably because of so many bash processes being spawned). IMO this is an unbearable amount of time for a language interpreter to start up, especially in a tight loop.

This problem can be mitigated by porting pyenv to POSIX shell so dash can be used (I can contribute here if desired), but /bin/sh won't always be dash. Instead, I've bypassed the problem on my local machine by adding a pyenv rehash hook that goes through every shim, runs pyenv which for one, then symlinks each one to links. Since bash isn't involved, python now starts up nearly instantly.

I want to implement this feature in pyenv so downstream projects can benefit from it too. I would argue shims should be replaced by symlinks, but they could be used as a fallback if symlinks aren't available on all platforms pyenv supports. I don't have intimate knowledge of pyenv's internals, so help on where to start would be appreciated.

hyperfine benchmark command
env -u BASH_ENV hyperfine -N "$(pyenv which python3) -h" "${PYENV_ROOT-$HOME/.pyenv}/shims/python3 -h"
pyenv.d/rehash/relink.bash
root=${PYENV_ROOT:-$HOME/.pyenv}

test -e "$root"/links || mkdir -p -- "$_" || return
rm -f -- "$_"/* || return

find "$root"/shims -mindepth 1 -maxdepth 1 ! -name .pyenv-shim -type f |
	sort -V |
	while IFS= read -r shim; do
		# Escape dashes for GNU basename
		case $shim in (-*) shim=./$shim ;; esac

		name=$(basename "$shim") || break
		printf >&2 "%s\n" "$name"
		if target=$("$root"/bin/pyenv which "$name"); then
			link=$root/links/$name
			ln -s -- "$target" "$link" || break
		fi
	done
@native-api
Copy link
Member

native-api commented Sep 30, 2023

With hard-coded linking, dynamic version selection won't work -- everything would constantly point to the version that was selected the last time pyenv rehash ran.

You can use Pyenv without shims.

@mrnoname1000
Copy link
Author

mrnoname1000 commented Sep 30, 2023

Why is dynamic version selection needed? The location of a given executable only changes when installing, removing, or selecting a version. My hook handles this as long as you rehash after changing versions, but that's not what I'm advocating for. Rather I want to move the version resolution from runtime to install/configure-time. I don't think it would be hard for pyenv to keep some symlinks in-sync with the selected versions, possibly as install/remove/global/local hooks.

Shims can be kept around, but if any of this sounds appealing I'd like to write some proof of concept code to see how feasible this is.

You can use Pyenv without shims.

This is too inflexible even compared to my relink hook. Any change in version will require an alteration to PATH in every downstream process and old entries will be kept around until you surgically remove them or reboot.

@alexdelorenzo
Copy link

Just posting an example of where slow shims might be a problem.

Launching the python/python3 shim takes an order of magnitude longer to launch than native python:

$ time /usr/bin/python --version > /dev/null
real    0m0.007s

$ time python --version > /dev/null
real    0m0.071s

$ which python
/home/user/.pyenv/shims/python

Normally not a big deal, but it can slow thing down in programs like Byobu, that implement status widgets whose values are updated running small scripts on every interval, where that interval can be multiple times a second.

For example, every status widget implemented here, which is all of them, imports this script, which launches Python for sanity checks.

That order of magnitude increase of launch time is then compounded by the amount of status widgets.

@native-api
Copy link
Member

native-api commented Oct 11, 2023

The location of a given executable only changes when installing, removing, or selecting a version.

Also when changing directories or a certain envvar.
Or something in Pyenv has changed since last time -- e.g. you added/removed a plugin with an exec hook.

@native-api
Copy link
Member

native-api commented Oct 11, 2023

We're all for optimizing the code. That just needs to be done without breaking it. Otherwise, users will report problems and we'll have to revert those "optimizations" back in order to fix them...

@jevinskie
Copy link

I’m going to give a stab at profiling and implementing the hot-path functions in C using loadable bash builtins (it seems there is already some precedent for readlink in src/). I think we would probably want to make it optional, while leaving the original bash implementations available, no?

Thoughts, suggestions, yays, nays?

@native-api
Copy link
Member

native-api commented Nov 28, 2023

IMO a C substitute is acceptable but should be a last resort -- as we'll have to keep it in sync with the Bash code and, what is more important, it'll make it harder to diagnose users' problems since it won't be producing an execution trace.
Since Pyenv checks a lot of places and calls a lot of stuff every time, I think a big speedup is achievable with just not doing the same stuff in multiple places and caching things like directory mtimes to avoid checking them every time.

@jevinskie
Copy link

I agree that keeping things in sync could be a burden. That’s why I would opt to keep the original scripts around and have some sort of conformity/correctness test suite (perhaps strace log based?). Adding caching (knowing what to cache, where to put the results, how to do all proper checks/interlocks) feels like beyond my abilities given my current understanding of pyenv internals, but maybe after giving the C builtins a go I would be able to grasp the scope better.

Anyways I did a bit of benchmarking. mintrue is a < 900 byte static-PIE ELF with _start that just calls SYS_exit with 0. bash was compiled and statically linked against readline and ncurses and gettext as -fPIE against musl libc.so where everything was built with clang-18 using -O3 -finstrument-functions (no -finstrument-functions on libc.so itself) where the enter/exit profiling functions were left as a NOP ret stub.

Benchmark 1: /home/jevin/.pyenv/versions/3.11.6/bin/mintrue
  Time (mean ± σ):      75.4 µs ±  23.2 µs    [User: 54.0 µs, System: 0.7 µs]
  Range (min … max):    61.7 µs … 779.7 µs    19584 runs
  
Benchmark 2: /home/jevin/code/bash/prefix-profile-bash/bin/bash -c /home/jevin/.pyenv/versions/3.11.6/bin/mintrue
  Time (mean ± σ):     446.1 µs ±  34.7 µs    [User: 396.7 µs, System: 27.7 µs]
  Range (min … max):   387.1 µs … 629.2 µs    6298 runs
  
Benchmark 3: /home/jevin/code/bash/prefix-profile-bash/bin/bash /home/jevin/.pyenv/shims/mintrue
  Time (mean ± σ):     155.0 ms ±  12.7 ms    [User: 86.3 ms, System: 73.3 ms]
  Range (min … max):    28.2 ms … 162.6 ms    1000 runs
  
Benchmark 4: /home/jevin/.pyenv/shims/mintrue
  Time (mean ± σ):     157.3 ms ±   7.4 ms    [User: 87.5 ms, System: 74.5 ms]
  Range (min … max):    51.7 ms … 164.8 ms    1000 runs
  
Summary
  /home/jevin/.pyenv/versions/3.11.6/bin/mintrue ran
    5.92 ± 1.88 times faster than /home/jevin/code/bash/prefix-profile-bash/bin/bash -c /home/jevin/.pyenv/versions/3.11.6/bin/mintrue
 2056.91 ± 655.60 times faster than /home/jevin/code/bash/prefix-profile-bash/bin/bash /home/jevin/.pyenv/shims/mintrue
 2087.61 ± 650.43 times faster than /home/jevin/.pyenv/shims/mintrue

Benchmark 3 vs 4 shows the difference between the top level use of env to invoke bash (but sub calls to other pyenv scripts still use env).

Full results

Now to try using uftrace and perhaps adding some some SystemTap DTrace probes into bash for better visibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants