Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse /proc/PID/stat file by cgroup hook properly #2629

Merged
merged 1 commit into from
May 25, 2024

Conversation

vchlum
Copy link
Contributor

@vchlum vchlum commented Mar 8, 2024

Describe Bug or Feature

The cgroup hook does not parse /proc/PID/stat file correctly. It simply anticipates the space is always a delimiter.

The man proc states:

              (2) comm  %s
                     The filename of the executable, in parentheses.  Strings longer than TASK_COMM_LEN (16) char‐
                     acters (including the terminating null byte) are silently truncated.  This is visible whether
                     or not the executable is swapped out.

and having processes with space in the name:

==> /proc/3523/stat <==
3523 (UVM global queue) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 20 0 1 0 3814 0 0 18446744073709551615 0 0 0 0 0 0 0 214 7483647 0 1 0 0 17 14 0 0 0 0 0 0 0 0 0 0 0 0 0

==> /proc/3524/stat <==
3524 (UVM deferred release queue) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 20 0 1 0 3814 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 16 0 0 0 0 0 0 0 0 0 0 0 0 0

==> /proc/3525/stat <==
3525 (UVM Tools Event Queue) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 20 0 1 0 3814 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 19 0 0 0 0 0 0 0 0 0 0 0 0 0

can cause errors like:

03/07/2024 06:21:31;0080;pbs_python;Hook;pbs_python;['Traceback (most recent call last):', '  File "<embedded code object>", line 6425, in main', '  File "<embedded code object>", line 1041, in invoke_handler', '  File "<embedded code object>", line 1240, in _execjob_launch_handler', '  File "<embedded code object>", line 4019, in add_pids', '  File "<embedded code object>", line 3980, in _get_pids_in_sid', "ValueError: invalid literal for int() with base 10: 'S'"]

The bug is also reported in #2628

Describe Your Change

The change splits the line from the stat file by a suitable regex. Valid columns are \(.+\) or [^\s]+, and splitting is done by spaces around these columns.

Link to Design Doc

Attach Test and Valgrind Logs/Output

  • Manual parsing test in python interpreter - simple regex testing:
>>> import re
>>> line="3523 (UVM global queue) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 20 0 1 0 3814 0 0 18446744073709551615 0 0 0 0 0 0 0 214 7483647 0 1 0 0 17 14 0 0 0 0 0 0 0 0 0 0 0 0 0"
>>> entries = re.split(r'\s+(\(.+\)|[^\s]+)\s+', line)
>>> print(entries)
['3523', '(UVM global queue)', 'S', '2', '0', '0', '0', '-1', '2129984', '0', '0', '0', '0', '0', '0', '0', '0', '20', '0', '1', '0', '3814', '0', '0', '18446744073709551615', '0', '0', '0', '0', '0', '0', '0', '214', '7483647', '0', '1', '0', '0', '17', '14', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0']

Copy link
Contributor

@bayucan bayucan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine.

@shapovalovts
Copy link

shapovalovts commented Apr 23, 2024

Will this fix go to already released OpenPBS versions?

@prakashcv13 prakashcv13 merged commit b1dd7c2 into openpbs:master May 25, 2024
6 checks passed
@vchlum vchlum deleted the cgroups_stat_file branch May 25, 2024 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants