Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pick the deepest error among the most relevant ones in each subschema #1258

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ilia1243
Copy link

@ilia1243 ilia1243 commented May 19, 2024

Improves best_match in the presence of anyOf / oneOf. Calculate the most relevant error in each separate subschema and choose the deepest one.

In particular, for anyOf / oneOf keywords with the only subschema, the best error is resolved as if the subschema was not enclosed by these keywords.

To reproduce:

from jsonschema import Draft202012Validator as Validator, exceptions

for applicator in "anyOf", "oneOf":
    # Should match {"properties": {"foo": {"minProperties": 2}}
    schema = {
        applicator: [
            {
                "properties": {
                    "foo": {
                        "minProperties": 2,
                        "properties": {"bar": {"type": "object"}},
                    },
                },
            },
        ],
    }
    instance = {"foo": {"bar": []}}
    error = exceptions.best_match(Validator(schema).iter_errors(instance))
    print(error)

Revert main code changes in commit b20234e preserving the tests.

Closes: #1257


📚 Documentation preview 📚: https://python-jsonschema--1258.org.readthedocs.build/en/1258/

…subschema

Improves `best_match` in the presence of `anyOf` / `oneOf`. Calculate the most relevant error in each separate subschema and choose the deepest one.

In particular for `anyOf` / `oneOf` keywords with the only subschema, the best error is resolved as if the subschema was not enclosed by these keywords.

Revert main code changes in commit b20234e preserving the tests.
@Julian
Copy link
Member

Julian commented May 20, 2024

Thanks! I'll have a look at this in the next day or two but initially looks reasonable!

@@ -464,9 +464,19 @@ def best_match(errors, key=relevance):
best = max(itertools.chain([best], errors), key=key)

while best.context:
# Calculate the most relevant error in each separate subschema
best_in_subschemas = [None] * len(best.validator_value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I'm still behind on looking at this, but just to mention why I haven't yet merged it, I want to stare at this closer, as I'm somewhat suspicious of the types here -- specifically, it's likely true that validator_value is always a container for any builtin JSON Schema keywords, but certainly not in general (i.e. someone can invent some other one).

And similarly below I'm slightly suspicious of the schema_path business.

Not to say any of it looks wrong, but I want to convince myself if it needs any additional hardening.

And of course again thanks for the thought and PR, I'll get to it sometime soon I hope (days not weeks).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If assuming that only oneOf and anyOf can have non-empty context, then schema_path is seemingly always an integer because it was produced from enumerating https://github.com/python-jsonschema/jsonschema/blob/v4.22.0/jsonschema/_keywords.py#L340
This also implies that yes, validator_value may be something that can be enumerated, but without len.

Alternatively, assuming that best.context has errors in strict order of .schema_path (see also the same algorithms _keywords.anyOf | oneOf):

        best_in_subschemas = []
        for error in best.context:
            index = error.schema_path[0]
            if index == len(best_in_subschemas):
                best_in_subschemas.append(error)
            else:
                prev = best_in_subschemas[index]
                best_in_subschemas[index] = max(prev, error, key=key)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mitigate undesired side effect of new best_match behaviour with alternative proposal
2 participants