Parameter parsing fails for strings containing `=` character #772

nateharms · 2024-01-30T15:58:42Z

Hello there! I'm using Papermill to automate some Jupyter notebook execution and I think I may have caught a potential bug: when running Papermill using strings containing a = character as the default parameter, the parser fails.

I get an error that looks like the following, but anonymized:

Unable to parse line X 'parameter='./all/date=20240102/ml_flow_run_id=Y/model/''.

After a little digging, I think this is because of the following code in translators.py:

if nequal > 0:
   grouped_variable.append(flatten_accumulator(accumulator))
   accumulator = []
   if nequal > 1:
       logger.warning(f"Unable to parse line {iline + 1} '{line}'.")
       continue ```

Since my parameter has = characters in it, nequal > 1 == True and we run into the warning.

Is this a desired behavior? i.e., is it expected that there should not be any = within default strong parameters?
Or are there plans to enable parameterization of strings containing = as defaults?

Thank you!

The text was updated successfully, but these errors were encountered:

MSeal · 2024-02-08T22:29:37Z

For now it's a constraint of the implementation it seems. We could improve that '=' check to check after doing quoted string capture groups to get around it. But someone would need to write a fairly good suite of unittests to ensure it works correctly in all cases (harder problem to get right than it may seem).

FlorianBracq · 2024-04-17T14:59:06Z

Hello,

would it be possible to use the built-in ast package to parse the python code?
https://docs.python.org/3/library/ast.html

I saw some comments from ~2020 about this not being compatible with other languages, but I would argue it's a safer approach than the regex version and would catch corner cases that are not yet properly supported by the regex, such as the one shared here or if the type provided contains a "." (cf sample below)

Sample parameter cell

import datetime as dt

string_param: str = ""
equal_param = "./all/date=20240102/ml_flow_run_id=Y/model/"
multi_line_param: str = "test1" + "test2"
date_param: dt.datetime = dt.datetime(2024, 4, 17, 0, 0, 0)
a = b = 2
list_param = []
dict_param = {
    "key1": "value",
    "key2": [1, 2, 3],
    "key3": "=",
}

Sample parsing code:

# param_code is a variable containing the above cell's source code
tree: ast.Module = ast.parse(param_code)
for node in ast.walk(tree):
    if isinstance(ann_assign := node, ast.AnnAssign) and isinstance(
            name := ann_assign.target, ast.Name
        ):
            print(name.id)
    elif isinstance(assign := node, ast.Assign):
        for target in assign.targets:
            if isinstance(target, ast.Name):
                print(target.id)

I get the following output:

string_param
equal_param
multi_line_param
date_param
a
b
list_param
dict_param

@MSeal: if that would be of interest to you, I can propose a PR for this :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameter parsing fails for strings containing `=` character #772

Parameter parsing fails for strings containing `=` character #772

nateharms commented Jan 30, 2024

MSeal commented Feb 8, 2024

FlorianBracq commented Apr 17, 2024

Parameter parsing fails for strings containing = character #772

Parameter parsing fails for strings containing = character #772

Comments

nateharms commented Jan 30, 2024

MSeal commented Feb 8, 2024

FlorianBracq commented Apr 17, 2024

Parameter parsing fails for strings containing `=` character #772

Parameter parsing fails for strings containing `=` character #772