Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenericModel[Any] raises ValidationError for GenericModel[SomeModel] #9414

Open
1 task done
gsakkis opened this issue May 8, 2024 · 2 comments
Open
1 task done
Labels
bug V2 Bug related to Pydantic V2 pending Awaiting a response / confirmation

Comments

@gsakkis
Copy link
Contributor

gsakkis commented May 8, 2024

Initial Checks

  • I confirm that I'm using Pydantic V2

Description

Annotating a field as ResponseModel[Any] (where ResponseModel is a generic BaseModel) fails to validate values of type ResponseModel and ResponseModel[Product]; it only validates values of ResponseModel[Any].

On the other hand annotating the field as ResponseModel validates all three cases, however mypy gives Missing type parameters for generic type "ResponseModel".

Questions:

  • Is the ResponseModel[Any] validation working as expected or it's a bug?
  • Is it safe to # type: ignore the second version that seems to be working?

Example Code

from typing import Any, Generic, TypeVar

from pydantic import BaseModel, ValidationError

T = TypeVar("T")


class ResponseModel(BaseModel, Generic[T]):
    content: T


class Product(BaseModel):
    name: str
    price: float


class Order(BaseModel):
    id: int
    product: ResponseModel[Any]
    # replacing the previous annotation with the following succeeds at runtime but fails type checking
    # product: ResponseModel


product = Product(name="Apple", price=0.5)
response1: ResponseModel[Any] = ResponseModel[Any](content=product)
response2: ResponseModel[Any] = ResponseModel(content=product)
response3: ResponseModel[Any] = ResponseModel[Product](content=product)

for response in response1, response2, response3:
    try:
        order = Order(id=1, product=response)
        print(f"{response!r} succeeded")
    except ValidationError:
        print(f"{response!r} failed")

Output:

ResponseModel[Any](content=Product(name='Apple', price=0.5)) succeeded
ResponseModel(content=Product(name='Apple', price=0.5)) failed
ResponseModel[Product](content=Product(name='Apple', price=0.5)) failed

Python, Pydantic & OS Version

pydantic version: 2.7.1
        pydantic-core version: 2.18.2
          pydantic-core build: profile=release pgo=true
               python version: 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) [Clang 16.0.6 ]
                     platform: macOS-14.1-arm64-arm-64bit
             related packages: mypy-1.9.0 typing_extensions-4.10.0 pydantic-settings-2.2.1
                       commit: unknown
@gsakkis gsakkis added bug V2 Bug related to Pydantic V2 pending Awaiting a response / confirmation labels May 8, 2024
@sydney-runkle
Copy link
Member

Hi @gsakkis,

Thanks for your questions. This comment might help to add some context regarding some of the nuances with validation and parametrized generics: #8884 (comment).

The case with Any is interesting. This does seem a bit buggy to me. Specifically, this works:

from typing import Any, Generic, TypeVar

from pydantic import BaseModel, ValidationError

T = TypeVar("T")


class ResponseModel(BaseModel, Generic[T]):
    content: T


class Product(BaseModel):
    name: str
    price: float


class Order(BaseModel):
    id: int
    product: ResponseModel[Any]
    # replacing the previous annotation with the following succeeds at runtime but fails type checking
    # product: ResponseModel


product = Product(name="Apple", price=0.5)
response1: ResponseModel[Any] = ResponseModel[Any](content=product)
response2: ResponseModel[Any] = ResponseModel(content=product)
response3: ResponseModel[Any] = ResponseModel[Product](content=product)

for response in response1, response2, response3:
    try:
        order = Order(id=1, product=response.model_dump())
        print(f"{response!r} succeeded")
    except ValidationError:
        print(f"{response!r} failed")

I'm going to chat with @dmontagu, our resident generics expert, about this and will get back to you!

@dmontagu
Copy link
Contributor

dmontagu commented May 30, 2024

So, the problem here is that we create concrete subclasses for each parametrization of a generic model, and model validation disallows instances of classes that are not subclasses.

More concretely, let's consider ResponseModel from your example, and I'll go a step further and pretend we also have class SubResponseModel(ResponseModel): .... In this case, ResponseModel[Any] is a proper subclass of ResponseModel, and ResponseModel[Product] is another proper subclass of ResponseModel. There is no connection between ResponseModel[Any] and ResponseModel[Product] — they might as well be class ResponseModelProduct(ResponseModel): ... and class ResponseModelAny(ResponseModel): .... Given this, hopefully it's clear why the current behavior raises errors (even if it's obvious that other behavior would be preferable).

The way I think this should work is that, any time you have a generic model which is an instance of a possibly-parametrized version of a subclass of the annotated field, if we would otherwise raise an error due to a violation of the class hierarchy, we should just attempt to revalidate the data similar-ish to how we would in v1. I'm not exactly sure of the exact semantics of how this should work, i.e., do we (effectively) call model_dump and model_validate, or try to re-validate the raw __dict__, or what, but regardless of what we choose I think it would improve the situation for the vast majority of scenarios where you want to have validation of subclasses.

More specifically, let's say I have a type annotation my_field: ResponseModel[X], and pass a value to that field which has type SubResponseModel[Y]. In this case, what I think we should do is (shallow) dump the instance of SubResponseModel[Y] and then attempt to validate that data into the field of type ResponseModel[X].

Note that even if we did that, there is still some obvious misbehavior that I'm sure some people would complain about, and so we may want to change those semantics a bit. In particular, if X is Any, then regardless of what Y is, an instance of SubResponseModel[Y] would be re-validated into ResponseModel[Any]. In particular, any fields present on SubResponseModel that are not present on ResponseModel would get dropped. This would not be the case if you had annotated the field as plain ResponseModel instead of ResponseModel[Any]. Note that you can work around this in a type-checking context by doing something like:

if typing.TYPE_CHECKING:
    ResponseModelAny = ResponseModel[Any]
else:
    ResponseModelAny = ResponseModel 

but obviously that still kind of sucks. However, I still think this would be better for most common scenarios.

There are also still other questions I don't have immediate answers for, such as — how does this interact with RootModels? How does it interact with subclasses of parametrized generic models (or partially parametrized generic models 😨)? How does it work with generic models that are subclasses of non-generic models? How does it work with subclasses of generic models that add more generic parameters? I won't be surprised if the answers to many of these questions end up being simple/obvious, but I will be surprised if that's the case for all of them.

I'll also note that I think a better route for generics implementations long term will be to stop creating new classes for generics parametrizations, and instead have instances of a new subclass of types.GenericAlias (I think that's the one) be returned when you call BaseModel.__class_getitem__. If I recall correctly, I think that may break some usage patterns, but I think by the time we release v3 we should probably make that change (though probably makes sense to retain the current behavior to whatever extent that we can't make the not-same-class version work). Note though that although this eliminates a large class of possible issues caused by rejecting objects that aren't subclasses of the annotation, it still suffers from the problem of needing to decide how to re-validate instances of (parametrized) generic model types when passing them into other validators (model fields/type adapters/etc.) where the generic parameters are different (and in particular, possibly Any). If we go down that route, I think we'd probably want to store some metadata about the generic parameters that were used during validation, to enable you to skip revalidation if revalidating a validated instance against the same parameters. The main reason I bring this up is that I suspect it may simplify some aspects of the implementation, and I would weakly prefer to implement generic-instance-revalidation in a way that will work with this approach.

There are just so many edge cases that it's hard to get the behavior right under all circumstances without functionally implementing a type-variance-aware runtime type checker, which I think is probably outside the scope of pydantic. But I do think we can chip away at these scenarios by tweaking the way model validation works in pydantic core, and just slowly expand the set of scenarios that don't raise errors, starting with revalidating data (e.g., by OtherModel.model_validate(first_model.model_dump())) when first_model and OtherModel share a generic origin class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug V2 Bug related to Pydantic V2 pending Awaiting a response / confirmation
Projects
None yet
Development

No branches or pull requests

3 participants