Add new implementation of get_num_cpu #7568

eivindjahren · 2024-04-05T07:00:17Z

There are a few points where the new implementation is slightly different from the old definition.

Neither ecl300 nor opm flow use " for strings, but resdata.rd_util.get_num_cpu considers that a string.
eclipse allows / to be escaped in a filename. dir\/my_file.txt is a valid filename for the INCLUDE statement for eclipse, but for INCLUDE\n dir/file.txt / the first / ends the statement so it is equivalent to INCLUDE\n dir /. OPM flow does not interpret the first / as ending the statement so includes it sucessfully.
both eclipse and opm do not allow escaping ' like this 'filename\'.txt', altough resdata.rd_util.get_num_cpu will in that case consider the second ' as not terminating the string which is inconsistent with what happens in eclipse and opm. resdata.rd_util.get_num_cpu does not allow you to do double escapes, ie 'filename.txt\\' is considered a non-terminated string. In filenames, opm wil replace \ with / on linux.
In the old implementation, a keyword could not be terminated by the start of a string, so PARALLEL'0' \ would util_abort due to non-terminated string literal.
In the old implementation, a zero character would terminate the line early, this is no longer the case.
resdata.rd_util.get_num_cpu looks for the words SLAVE and PARALLEL anywhere on the line, but according to the specification, the word must be the first word on the line. The new implementation is limited to this, and also just the first 8 characters of that word as that is another requirement of the file format.
None of the implementations look for the keywords in an included file.

I ran the following with a version of resdata.rd_util.get_num_cpu where most failures return 1 instead of util_abort:

from resdata import ResdataUtil
from ert.config._get_num_cpu import get_num_cpu_from_data_file as ert_get_num_cpu

from hypothesis import given, event, settings
import hypothesis.strategies as st
from hypothesis.extra.lark import from_lark
import pytest
from lark import Lark

rd_get_num_cpu = ResdataUtil.get_num_cpu

common_grammar = r"""
        VALUE: STRING
             | NUMBER
             | "true" | "false" | "null"
        _STRING_INNER: /[^\\\x00]*?/

        STRING: "'" _STRING_INNER "'"
        %import common.SIGNED_NUMBER    -> NUMBER
        WS: " "
        NEWLINE: "\n"
        %ignore WS

        COMMENT: "--" /[^\n]*/ WS NEWLINE
        %ignore COMMENT
"""

parallel_sections = from_lark(
    Lark(
        common_grammar
        + r"""
        line: "PARALLEL" VALUE* "/" NEWLINE
        """,
        start="line",
    )
)

slaves_sections = from_lark(
    Lark(
        common_grammar
        + r"""
        line: VALUE+ " /" NEWLINE
        definition: "SLAVES" NEWLINE line+ "/" NEWLINE
        """,
        start="definition",
    )
)


@pytest.mark.usefixtures("use_tmpdir")
@settings(max_examples=100)
@given(
    st.lists(st.one_of(slaves_sections, parallel_sections)).map(lambda x: "\n".join(x))
)
def test_get_num_cpu_is_same_from_ert_and_resdata(data_contents):
    print(repr(data_contents))
    with open("TEST.DATA", "w") as f:
        f.write(data_contents)

    from_rd = rd_get_num_cpu("TEST.DATA")
    from_ert = ert_get_num_cpu("TEST.DATA")
    event(str(from_rd))
    if from_ert is None:
        assert from_rd == 1
    else:
        assert from_rd == from_ert

On this input, there are no differences.

The following bug in resdata was found while investigating what the impact would be from changing implementation:

util_sscanf_int treats empty string as 0 without errors resdata#959

PR title captures the intent of the changes, and is fitting for release notes.
Added appropriate release note label
Commit history is consistent and clean, in line with the contribution guidelines.
Make sure tests pass locally (after every commit!)

When applicable

When there are user facing changes: Updated documentation
New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
Large PR: Prepare changes in small commits for more convenient review
Bug fix: Add regression test for the bug
Bug fix: Create Backport PR to latest release

oyvindeide

Looks good, just some minor comments, and we decided to back port this to the last release with logging comparing to the old so we can evaluate that it works as expected.

src/ert/config/_get_num_cpu.py

oyvindeide · 2024-04-05T10:13:07Z

src/ert/substitution_list.py

- @no_type_check
- @staticmethod
- def from_dict(config_dict) -> SubstitutionList:
- subst_list = SubstitutionList()
-
- for key, val in config_dict.get("DEFINE", []):
- subst_list[key] = val
-
- if "<CONFIG_PATH>" not in subst_list:
- subst_list["<CONFIG_PATH>"] = config_dict.get(
- "CONFIG_DIRECTORY", os.getcwd()
- )
-
- num_cpus = config_dict.get("NUM_CPU")
- if num_cpus is None and "DATA_FILE" in config_dict:
- num_cpus = get_num_cpu_from_data_file(config_dict.get("DATA_FILE"))
- if num_cpus is None:
- num_cpus = 1
- subst_list["<NUM_CPU>"] = str(num_cpus)
-
- for key, val in config_dict.get("DATA_KW", []):
- subst_list[key] = val
-
- return subst_list
-


Why is this moved? Makes equal sense to keep this logic here?

Its moved as you get circular import problems otherwise. Also, the whole config_dict debacle we would like to contain inside of ert.config

Co-authored-by: Øyvind Eide <[email protected]>

codecov-commenter · 2024-04-05T10:42:08Z

Codecov Report

Attention: Patch coverage is 90.90909% with 11 lines in your changes are missing coverage. Please review.

Project coverage is 84.48%. Comparing base (2b180e9) to head (93d06c2).
Report is 96 commits behind head on main.

Files	Patch %	Lines
src/ert/config/_get_num_cpu.py	89.71%	11 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7568      +/-   ##
==========================================
- Coverage   85.46%   84.48%   -0.99%     
==========================================
  Files         384      388       +4     
  Lines       22867    23336     +469     
  Branches      876      885       +9     
==========================================
+ Hits        19544    19716     +172     
- Misses       3213     3506     +293     
- Partials      110      114       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Move substitution_list constructor to ert_config

0ce54b5

eivindjahren added the release-notes:skip If there should be no mention of this in release notes label Apr 5, 2024

eivindjahren force-pushed the install_new_parallel_alongside branch from 1a566c7 to c13ab33 Compare April 5, 2024 07:37

Replace get_num_cpu from resdata.util

304bb86

eivindjahren force-pushed the install_new_parallel_alongside branch from c13ab33 to 304bb86 Compare April 5, 2024 07:40

oyvindeide approved these changes Apr 5, 2024

View reviewed changes

eivindjahren and others added 2 commits April 5, 2024 12:23

Update src/ert/config/_get_num_cpu.py

79bb9cb

Co-authored-by: Øyvind Eide <[email protected]>

Update src/ert/config/_get_num_cpu.py

93d06c2

Co-authored-by: Øyvind Eide <[email protected]>

eivindjahren merged commit e2bd7c1 into equinor:main Apr 5, 2024
37 checks passed

eivindjahren deleted the install_new_parallel_alongside branch April 5, 2024 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new implementation of get_num_cpu #7568

Add new implementation of get_num_cpu #7568

eivindjahren commented Apr 5, 2024 •

edited

Loading

oyvindeide left a comment

oyvindeide Apr 5, 2024

eivindjahren Apr 5, 2024

codecov-commenter commented Apr 5, 2024

Add new implementation of get_num_cpu #7568

Add new implementation of get_num_cpu #7568

Conversation

eivindjahren commented Apr 5, 2024 • edited Loading

When applicable

oyvindeide left a comment

Choose a reason for hiding this comment

oyvindeide Apr 5, 2024

Choose a reason for hiding this comment

eivindjahren Apr 5, 2024

Choose a reason for hiding this comment

codecov-commenter commented Apr 5, 2024

Codecov Report

eivindjahren commented Apr 5, 2024 •

edited

Loading