Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E0401 (import-error) checks perform many isinstance calls when searching for zipimporters #9607

Closed
correctmost opened this issue May 7, 2024 · 0 comments · Fixed by pylint-dev/astroid#2428
Labels
Astroid Related to astroid Enhancement ✨ Improvement to a component Needs PR This issue is accepted, sufficiently specified and now needs an implementation performance

Comments

@correctmost
Copy link

correctmost commented May 7, 2024

Bug description

In astroid, there is a _get_zipimporters function that scans sys.path_importer_cache for zipimporter finders:

def _get_zipimporters() -> Iterator[tuple[str, zipimport.zipimporter]]:
    for filepath, importer in sys.path_importer_cache.items():
        if isinstance(importer, zipimport.zipimporter):
            yield filepath, importer

The isinstance call gets executed ~35 million times when running pylint on the yt-dlp repo. ~32 million of these invocations can be avoided by checking for None first:

if importer is not None and isinstance(importer, zipimport.zipimporter)

A further optimization would be to reduce the number of times path_importer_cache is scanned, but that seems trickier to accomplish.

Configuration

[MAIN]
jobs=1

[MESSAGES CONTROL]
disable=all
enable=E0401

[REPORTS]
reports=no
score=no

Command used

Steps to reproduce

git clone https://github.com/yt-dlp/yt-dlp.git
cd yt-dlp
git checkout 5904853ae5788509fdc4892cb7ecdfa9ae7f78e6

cat << EOF > ./profile_pylint.py
import cProfile
import pstats
import sys

sys.argv = ['pylint', '--recursive=y', '.']
cProfile.run('from pylint import __main__', filename='stats')

with open('profiler_stats', 'w', encoding='utf-8') as file:
    stats = pstats.Stats('stats', stream=file)
    stats.sort_stats('tottime')
    stats.print_stats()
EOF

cat << EOF > .pylintrc
[MAIN]
jobs=1

[MESSAGES CONTROL]
disable=all
enable=E0401

[REPORTS]
reports=no
score=no
EOF

python ./profile_pylint.py

Analysis

_get_zipimporters calls isinstance ~35 million times

import pstats

stats = pstats.Stats('stats')
stats.print_callers('isinstance')

  ncalls  tottime  cumtime
35188734    4.255    4.255  astroid/interpreter/_import/spec.py:342(_get_zipimporters)

Checking for None first seems twice as fast:

$ python -m timeit -s 'import zipimport; x=None' -n 1000000 'isinstance(x, zipimport.zipimporter)' 
1000000 loops, best of 5: 48.9 nsec per loop

$ python -m timeit -s 'import zipimport; x=None' -n 1000000 'x is not None and isinstance(x, zipimport.zipimporter)' 
1000000 loops, best of 5: 24.3 nsec per loop

Pylint output

There may be some import errors depending on your (virtual) environment, but the output is less important than the performance numbers.

Expected behavior

Improved performance via reduced isinstance calls

Pylint version

astroid @ git+https://github.com/pylint-dev/astroid.git@a4a9fcc44ae0d71773dc3bff6baa78fc571ecb7d
pylint @ git+https://github.com/pylint-dev/pylint.git@500774ae5a4e49e2aa0c8d3f2b64613e21aa676e
Python 3.12.3

OS / Environment

Arch Linux

Additional dependencies

No response

@correctmost correctmost added the Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling label May 7, 2024
@Pierre-Sassoulas Pierre-Sassoulas added Enhancement ✨ Improvement to a component performance Needs PR This issue is accepted, sufficiently specified and now needs an implementation Astroid Related to astroid and removed Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels May 7, 2024
correctmost added a commit to correctmost/astroid that referenced this issue May 10, 2024
_get_zipimporters can call isinstance millions of times when
running pylint's import-error checker on a codebase like yt-dlp.

Checking for None first avoids the overhead of invoking isinstance.

Closes pylint-dev/pylint#9607.
DanielNoord pushed a commit to pylint-dev/astroid that referenced this issue May 11, 2024
_get_zipimporters can call isinstance millions of times when
running pylint's import-error checker on a codebase like yt-dlp.

Checking for None first avoids the overhead of invoking isinstance.

Closes pylint-dev/pylint#9607.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Astroid Related to astroid Enhancement ✨ Improvement to a component Needs PR This issue is accepted, sufficiently specified and now needs an implementation performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants