Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue on Ubuntu #205

Open
anttimc opened this issue Nov 15, 2023 · 5 comments · May be fixed by #210
Open

Performance issue on Ubuntu #205

anttimc opened this issue Nov 15, 2023 · 5 comments · May be fixed by #210

Comments

@anttimc
Copy link

anttimc commented Nov 15, 2023

Getting the cpu info on ubuntu takes over one second. I noticed this degradation in PyTables where the function is called at import time

  • Operating System: Ubuntu 20.04.6 LTS (in a docker container)
  • Python version: 3.11.6
  • Version of py-cpuinfo 9.0.0

Simple benchmark on ubuntu (in a docker container)

>>> import time
>>> import cpuinfo
>>> start=time.time(); cpu_info = cpuinfo.get_cpu_info(); print(time.time() - start)
1.0749070644378662

On fedora (the same machine and CPU)

>>> start=time.time(); cpu_info = cpuinfo.get_cpu_info(); print(time.time() - start)
0.07710409164428711
@anttimc
Copy link
Author

anttimc commented Nov 30, 2023

I managed to track down the issue with line_profiler. The time goes to CPUID.get_raw_hz

CPUID.get_raw_hz sleeps for one second to get a number of ticks!
https://github.com/workhorsy/py-cpuinfo/blob/4824ec0746be0dcee9bf9528dc8cdd0c1640cd9d/cpuinfo/cpuinfo.py#L1508C1-L1521C1

I think this kind of waiting time totally unacceptable. I wonder why I don't see this issue on Fedora? Probably something in the try-block fails before the sleep.

@anttimc
Copy link
Author

anttimc commented Nov 30, 2023

On fedora it probably stops at this condition where SELinux is checked

@anttimc
Copy link
Author

anttimc commented Nov 30, 2023

As cpuinfo is a dependency of pytables which is a depedency of pandas for hdf5 file I/O, this issue can hit quite many users.

To remedy the issue, I think hz_actual should be made optional in get_cpu_info or there could be many methods to get only the info that the user needs. What do you think @workhorsy ?

@anttimc
Copy link
Author

anttimc commented Dec 1, 2023

A suggestion:

Refactor the info
https://github.com/workhorsy/py-cpuinfo/blob/4824ec0746be0dcee9bf9528dc8cdd0c1640cd9d/cpuinfo/cpuinfo.py#L1566C1-L1585C4
into functions that return groups of info dictionaries and call only those functions that are desired.

The groups could be for example raw_info, hz_info, cache_info, basic_info as they are now grouped in the info dict in the linked snippet. Or it could be even more fine-grained.

The desired groups could be passed as flags to the subprocesses and finally as a list of arguments to _get_cpu_info_from_cpuid_actual. The default behaviour would be to return an info with all the groups.

Motivation: For example, in pytables only the cache info is needed to optimize file I/O.

@ullix
Copy link

ullix commented Jan 2, 2024

Unsurprisingly also found on Linux Mint 21.
1 sec sleep for CPUID.get_raw_hz, which I don't need at all, is really annoying. I hope this will get changed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants