Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter parsing fails for strings containing = character #772

Open
nateharms opened this issue Jan 30, 2024 · 2 comments
Open

Parameter parsing fails for strings containing = character #772

nateharms opened this issue Jan 30, 2024 · 2 comments

Comments

@nateharms
Copy link

Hello there! I'm using Papermill to automate some Jupyter notebook execution and I think I may have caught a potential bug: when running Papermill using strings containing a = character as the default parameter, the parser fails.

I get an error that looks like the following, but anonymized:

Unable to parse line X 'parameter='./all/date=20240102/ml_flow_run_id=Y/model/''.

After a little digging, I think this is because of the following code in translators.py:

if nequal > 0:
   grouped_variable.append(flatten_accumulator(accumulator))
   accumulator = []
   if nequal > 1:
       logger.warning(f"Unable to parse line {iline + 1} '{line}'.")
       continue ```

Since my parameter has = characters in it, nequal > 1 == True and we run into the warning.

Is this a desired behavior? i.e., is it expected that there should not be any = within default strong parameters?
Or are there plans to enable parameterization of strings containing = as defaults?

Thank you!

@MSeal
Copy link
Member

MSeal commented Feb 8, 2024

For now it's a constraint of the implementation it seems. We could improve that '=' check to check after doing quoted string capture groups to get around it. But someone would need to write a fairly good suite of unittests to ensure it works correctly in all cases (harder problem to get right than it may seem).

@FlorianBracq
Copy link

Hello,

would it be possible to use the built-in ast package to parse the python code?
https://docs.python.org/3/library/ast.html

I saw some comments from ~2020 about this not being compatible with other languages, but I would argue it's a safer approach than the regex version and would catch corner cases that are not yet properly supported by the regex, such as the one shared here or if the type provided contains a "." (cf sample below)

Sample parameter cell

import datetime as dt

string_param: str = ""
equal_param = "./all/date=20240102/ml_flow_run_id=Y/model/"
multi_line_param: str = "test1" + "test2"
date_param: dt.datetime = dt.datetime(2024, 4, 17, 0, 0, 0)
a = b = 2
list_param = []
dict_param = {
    "key1": "value",
    "key2": [1, 2, 3],
    "key3": "=",
}

Sample parsing code:

# param_code is a variable containing the above cell's source code
tree: ast.Module = ast.parse(param_code)
for node in ast.walk(tree):
    if isinstance(ann_assign := node, ast.AnnAssign) and isinstance(
            name := ann_assign.target, ast.Name
        ):
            print(name.id)
    elif isinstance(assign := node, ast.Assign):
        for target in assign.targets:
            if isinstance(target, ast.Name):
                print(target.id)

I get the following output:

string_param
equal_param
multi_line_param
date_param
a
b
list_param
dict_param

@MSeal: if that would be of interest to you, I can propose a PR for this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants