Pick the deepest error among the most relevant ones in each subschema #1258

ilia1243 · 2024-05-19T19:33:19Z

Improves best_match in the presence of anyOf / oneOf. Calculate the most relevant error in each separate subschema and choose the deepest one.

In particular, for anyOf / oneOf keywords with the only subschema, the best error is resolved as if the subschema was not enclosed by these keywords.

To reproduce:

from jsonschema import Draft202012Validator as Validator, exceptions

for applicator in "anyOf", "oneOf":
    # Should match {"properties": {"foo": {"minProperties": 2}}
    schema = {
        applicator: [
            {
                "properties": {
                    "foo": {
                        "minProperties": 2,
                        "properties": {"bar": {"type": "object"}},
                    },
                },
            },
        ],
    }
    instance = {"foo": {"bar": []}}
    error = exceptions.best_match(Validator(schema).iter_errors(instance))
    print(error)

Revert main code changes in commit b20234e preserving the tests.

Closes: #1257

📚 Documentation preview 📚: https://python-jsonschema--1258.org.readthedocs.build/en/1258/

…subschema Improves `best_match` in the presence of `anyOf` / `oneOf`. Calculate the most relevant error in each separate subschema and choose the deepest one. In particular for `anyOf` / `oneOf` keywords with the only subschema, the best error is resolved as if the subschema was not enclosed by these keywords. Revert main code changes in commit b20234e preserving the tests.

Julian · 2024-05-20T06:51:09Z

Thanks! I'll have a look at this in the next day or two but initially looks reasonable!

Julian · 2024-05-30T13:27:18Z

jsonschema/exceptions.py

@@ -464,9 +464,19 @@ def best_match(errors, key=relevance):
 best = max(itertools.chain([best], errors), key=key)

 while best.context:
+ # Calculate the most relevant error in each separate subschema
+ best_in_subschemas = [None] * len(best.validator_value)


I know I'm still behind on looking at this, but just to mention why I haven't yet merged it, I want to stare at this closer, as I'm somewhat suspicious of the types here -- specifically, it's likely true that validator_value is always a container for any builtin JSON Schema keywords, but certainly not in general (i.e. someone can invent some other one).

And similarly below I'm slightly suspicious of the schema_path business.

Not to say any of it looks wrong, but I want to convince myself if it needs any additional hardening.

And of course again thanks for the thought and PR, I'll get to it sometime soon I hope (days not weeks).

If assuming that only oneOf and anyOf can have non-empty context, then schema_path is seemingly always an integer because it was produced from enumerating https://github.com/python-jsonschema/jsonschema/blob/v4.22.0/jsonschema/_keywords.py#L340
This also implies that yes, validator_value may be something that can be enumerated, but without len.

Alternatively, assuming that best.context has errors in strict order of .schema_path (see also the same algorithms _keywords.anyOf | oneOf):

best_in_subschemas = [] for error in best.context: index = error.schema_path[0] if index == len(best_in_subschemas): best_in_subschemas.append(error) else: prev = best_in_subschemas[index] best_in_subschemas[index] = max(prev, error, key=key)

ilia1243 temporarily deployed to PyPI May 19, 2024 19:43 — with GitHub Actions Inactive

Julian reviewed May 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pick the deepest error among the most relevant ones in each subschema #1258

Pick the deepest error among the most relevant ones in each subschema #1258

ilia1243 commented May 19, 2024 •

edited by github-actions bot

Julian commented May 20, 2024

Julian May 30, 2024

ilia1243 May 30, 2024

Pick the deepest error among the most relevant ones in each subschema #1258

Are you sure you want to change the base?

Pick the deepest error among the most relevant ones in each subschema #1258

Conversation

ilia1243 commented May 19, 2024 • edited by github-actions bot

Julian commented May 20, 2024

Julian May 30, 2024

Choose a reason for hiding this comment

ilia1243 May 30, 2024

Choose a reason for hiding this comment

ilia1243 commented May 19, 2024 •

edited by github-actions bot