Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new implementation of get_num_cpu #7568

Merged

Conversation

eivindjahren
Copy link
Contributor

@eivindjahren eivindjahren commented Apr 5, 2024

There are a few points where the new implementation is slightly different from the old definition.

  • Neither ecl300 nor opm flow use " for strings, but resdata.rd_util.get_num_cpu considers that a string.
  • eclipse allows / to be escaped in a filename. dir\/my_file.txt is a valid filename for the INCLUDE statement for eclipse, but for INCLUDE\n dir/file.txt / the first / ends the statement so it is equivalent to INCLUDE\n dir /. OPM flow does not interpret the first / as ending the statement so includes it sucessfully.
  • both eclipse and opm do not allow escaping ' like this 'filename\'.txt', altough resdata.rd_util.get_num_cpu will in that case consider the second ' as not terminating the string which is inconsistent with what happens in eclipse and opm. resdata.rd_util.get_num_cpu does not allow you to do double escapes, ie 'filename.txt\\' is considered a non-terminated string. In filenames, opm wil replace \ with / on linux.
  • In the old implementation, a keyword could not be terminated by the start of a string, so PARALLEL'0' \ would util_abort due to non-terminated string literal.
  • In the old implementation, a zero character would terminate the line early, this is no longer the case.
  • resdata.rd_util.get_num_cpu looks for the words SLAVE and PARALLEL anywhere on the line, but according to the specification, the word must be the first word on the line. The new implementation is limited to this, and also just the first 8 characters of that word as that is another requirement of the file format.
  • None of the implementations look for the keywords in an included file.

I ran the following with a version of resdata.rd_util.get_num_cpu where most failures return 1 instead of util_abort:

from resdata import ResdataUtil
from ert.config._get_num_cpu import get_num_cpu_from_data_file as ert_get_num_cpu

from hypothesis import given, event, settings
import hypothesis.strategies as st
from hypothesis.extra.lark import from_lark
import pytest
from lark import Lark

rd_get_num_cpu = ResdataUtil.get_num_cpu

common_grammar = r"""
        VALUE: STRING
             | NUMBER
             | "true" | "false" | "null"
        _STRING_INNER: /[^\\\x00]*?/

        STRING: "'" _STRING_INNER "'"
        %import common.SIGNED_NUMBER    -> NUMBER
        WS: " "
        NEWLINE: "\n"
        %ignore WS

        COMMENT: "--" /[^\n]*/ WS NEWLINE
        %ignore COMMENT
"""

parallel_sections = from_lark(
    Lark(
        common_grammar
        + r"""
        line: "PARALLEL" VALUE* "/" NEWLINE
        """,
        start="line",
    )
)

slaves_sections = from_lark(
    Lark(
        common_grammar
        + r"""
        line: VALUE+ " /" NEWLINE
        definition: "SLAVES" NEWLINE line+ "/" NEWLINE
        """,
        start="definition",
    )
)


@pytest.mark.usefixtures("use_tmpdir")
@settings(max_examples=100)
@given(
    st.lists(st.one_of(slaves_sections, parallel_sections)).map(lambda x: "\n".join(x))
)
def test_get_num_cpu_is_same_from_ert_and_resdata(data_contents):
    print(repr(data_contents))
    with open("TEST.DATA", "w") as f:
        f.write(data_contents)

    from_rd = rd_get_num_cpu("TEST.DATA")
    from_ert = ert_get_num_cpu("TEST.DATA")
    event(str(from_rd))
    if from_ert is None:
        assert from_rd == 1
    else:
        assert from_rd == from_ert

On this input, there are no differences.

The following bug in resdata was found while investigating what the impact would be from changing implementation:

  • PR title captures the intent of the changes, and is fitting for release notes.
  • Added appropriate release note label
  • Commit history is consistent and clean, in line with the contribution guidelines.
  • Make sure tests pass locally (after every commit!)

When applicable

  • When there are user facing changes: Updated documentation
  • New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
  • Large PR: Prepare changes in small commits for more convenient review
  • Bug fix: Add regression test for the bug
  • Bug fix: Create Backport PR to latest release

@eivindjahren eivindjahren added the release-notes:skip If there should be no mention of this in release notes label Apr 5, 2024
@eivindjahren eivindjahren force-pushed the install_new_parallel_alongside branch from 1a566c7 to c13ab33 Compare April 5, 2024 07:37
@eivindjahren eivindjahren force-pushed the install_new_parallel_alongside branch from c13ab33 to 304bb86 Compare April 5, 2024 07:40
Copy link
Collaborator

@oyvindeide oyvindeide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some minor comments, and we decided to back port this to the last release with logging comparing to the old so we can evaluate that it works as expected.

src/ert/config/_get_num_cpu.py Outdated Show resolved Hide resolved
src/ert/config/_get_num_cpu.py Outdated Show resolved Hide resolved
Comment on lines -27 to -51
@no_type_check
@staticmethod
def from_dict(config_dict) -> SubstitutionList:
subst_list = SubstitutionList()

for key, val in config_dict.get("DEFINE", []):
subst_list[key] = val

if "<CONFIG_PATH>" not in subst_list:
subst_list["<CONFIG_PATH>"] = config_dict.get(
"CONFIG_DIRECTORY", os.getcwd()
)

num_cpus = config_dict.get("NUM_CPU")
if num_cpus is None and "DATA_FILE" in config_dict:
num_cpus = get_num_cpu_from_data_file(config_dict.get("DATA_FILE"))
if num_cpus is None:
num_cpus = 1
subst_list["<NUM_CPU>"] = str(num_cpus)

for key, val in config_dict.get("DATA_KW", []):
subst_list[key] = val

return subst_list

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this moved? Makes equal sense to keep this logic here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its moved as you get circular import problems otherwise. Also, the whole config_dict debacle we would like to contain inside of ert.config

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 90.90909% with 11 lines in your changes are missing coverage. Please review.

Project coverage is 84.48%. Comparing base (2b180e9) to head (93d06c2).
Report is 96 commits behind head on main.

Files Patch % Lines
src/ert/config/_get_num_cpu.py 89.71% 11 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7568      +/-   ##
==========================================
- Coverage   85.46%   84.48%   -0.99%     
==========================================
  Files         384      388       +4     
  Lines       22867    23336     +469     
  Branches      876      885       +9     
==========================================
+ Hits        19544    19716     +172     
- Misses       3213     3506     +293     
- Partials      110      114       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eivindjahren eivindjahren merged commit e2bd7c1 into equinor:main Apr 5, 2024
37 checks passed
@eivindjahren eivindjahren deleted the install_new_parallel_alongside branch April 5, 2024 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes:skip If there should be no mention of this in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants