Performance issue on Ubuntu #205

anttimc · 2023-11-15T09:11:46Z

Getting the cpu info on ubuntu takes over one second. I noticed this degradation in PyTables where the function is called at import time

Operating System: Ubuntu 20.04.6 LTS (in a docker container)
Python version: 3.11.6
Version of py-cpuinfo 9.0.0

Simple benchmark on ubuntu (in a docker container)

>>> import time
>>> import cpuinfo
>>> start=time.time(); cpu_info = cpuinfo.get_cpu_info(); print(time.time() - start)
1.0749070644378662

On fedora (the same machine and CPU)

>>> start=time.time(); cpu_info = cpuinfo.get_cpu_info(); print(time.time() - start)
0.07710409164428711

The text was updated successfully, but these errors were encountered:

anttimc · 2023-11-30T12:14:49Z

I managed to track down the issue with line_profiler. The time goes to CPUID.get_raw_hz

CPUID.get_raw_hz sleeps for one second to get a number of ticks!
https://github.com/workhorsy/py-cpuinfo/blob/4824ec0746be0dcee9bf9528dc8cdd0c1640cd9d/cpuinfo/cpuinfo.py#L1508C1-L1521C1

I think this kind of waiting time totally unacceptable. I wonder why I don't see this issue on Fedora? Probably something in the try-block fails before the sleep.

anttimc · 2023-11-30T12:30:00Z

On fedora it probably stops at this condition where SELinux is checked

anttimc · 2023-11-30T13:03:22Z

As cpuinfo is a dependency of pytables which is a depedency of pandas for hdf5 file I/O, this issue can hit quite many users.

To remedy the issue, I think hz_actual should be made optional in get_cpu_info or there could be many methods to get only the info that the user needs. What do you think @workhorsy ?

anttimc · 2023-12-01T07:18:42Z

A suggestion:

Refactor the info
https://github.com/workhorsy/py-cpuinfo/blob/4824ec0746be0dcee9bf9528dc8cdd0c1640cd9d/cpuinfo/cpuinfo.py#L1566C1-L1585C4
into functions that return groups of info dictionaries and call only those functions that are desired.

The groups could be for example raw_info, hz_info, cache_info, basic_info as they are now grouped in the info dict in the linked snippet. Or it could be even more fine-grained.

The desired groups could be passed as flags to the subprocesses and finally as a list of arguments to _get_cpu_info_from_cpuid_actual. The default behaviour would be to return an info with all the groups.

Motivation: For example, in pytables only the cache info is needed to optimize file I/O.

ullix · 2024-01-02T14:38:45Z

Unsurprisingly also found on Linux Mint 21.
1 sec sleep for CPUID.get_raw_hz, which I don't need at all, is really annoying. I hope this will get changed!

anttimc mentioned this issue Nov 15, 2023

Performance issue at import time on Ubuntu PyTables/PyTables#1081

Closed

anttimc mentioned this issue Nov 30, 2023

cache result? #206

Open

maxnoe linked a pull request Jan 4, 2024 that will close this issue

Use a much shorter time to measure current cpu frequency #210

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issue on Ubuntu #205

Performance issue on Ubuntu #205

anttimc commented Nov 15, 2023

anttimc commented Nov 30, 2023

anttimc commented Nov 30, 2023

anttimc commented Nov 30, 2023 •

edited

anttimc commented Dec 1, 2023 •

edited

ullix commented Jan 2, 2024

Performance issue on Ubuntu #205

Performance issue on Ubuntu #205

Comments

anttimc commented Nov 15, 2023

anttimc commented Nov 30, 2023

anttimc commented Nov 30, 2023

anttimc commented Nov 30, 2023 • edited

anttimc commented Dec 1, 2023 • edited

ullix commented Jan 2, 2024

anttimc commented Nov 30, 2023 •

edited

anttimc commented Dec 1, 2023 •

edited