-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOCS] incomplete/incorrect statements regarding marker API #602
Comments
The bug with multiple threads reading/reporting counters (marker API only) which they should not access seems to go away when a topology file, generated via likwid-genTopoCfg, is present on the node. I assume the topology parser (when there's no topo file) has some bugs which need to be fixed, or the topo should not be recreated for threads within the marker ROI. Anyhow, if you want to recreate the issue i suggest starting with this command on a a64fx (or other node with multiple numa domains):
(see PR #603 for the ENERGY.txt file) |
The issue comes from changed CPUsets is both cases. When an application is started through LIKWID, the application initially has a CPUset containing all selected HWthreads. If If the topology file is provided, the application as well as all started threads read their topology from the file. This included the CPUset (commonly all threads are allowed because |
The wiki (https://github.com/RRZE-HPC/likwid/wiki/likwid-perfctr#using-the-marker-api) states the following:
but if any openmp region is opened before the LIKWID_MARKER_INIT call, then the internal data structures are incorrect (or at least might be, depending on the underlying CPU/node arch), and counters are read incorrectly.
E.g. on A64FX with 4 ranks and 6 threads trying to read EA_L2 results in rank 0 / thread 0 reading the counter (so far so good), but also rank 1 / thread 0+1, rank 2 / thread 0+1, and rank 3 / thread 0+1 are reading the same counter. Thread 1 should not read it, but is due to a incorrectly created internal topology data structure.
The text was updated successfully, but these errors were encountered: