Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: class docstring overrides default description from __schema__ #239

Closed
wants to merge 2 commits into from

Conversation

apirogov
Copy link

@apirogov apirogov commented Sep 27, 2022

fixes #238

I do it in __modify_schema__, so that the docstring, if present, is applied last and not first (which would be the case when just adding it in __schema__)

@apirogov
Copy link
Author

apirogov commented Sep 27, 2022

@antonagestam I figured I just open the pull request and use the CI pipeline to run tests, could you please approve that? :) First I wanted to see if nothing breaks.
Then I also will

  • add a few simple tests in tests/pydantic/test_schemas.py
  • update docs/pages/pydantic-support.rst

and then mark it as ready.

Comment on lines 50 to 51
Note that the docstring attached to a phantom type class, if present,
will override any description that is defined in `__schema__()`.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the behavior we want to introduce. We should still allow users to override with an explicit value for "description" in __schema__. A really easy way to achieve that should be to move the implementation from __modify_schema__ to __schema__ where you can literally just do {..., "description": cls.__doc__}.

Copy link
Author

@apirogov apirogov Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't think that it will have the intended effect. Consider:

class FancyScore(float, Closed, low=0, high=10):
    """My fancy description"""

Now if I call schema() on a model using it, as I understand, the chain would go:

  • FancyScore.__modify_schema__() (which is SchemaField.__modify_schema__)
  • uses FancyScore.__schema__()
  • which inherits Closed.__schema__() <- provides a default description overriding what follows below
  • which uses super().__schema__() (possibly through a few superclasses)
  • which uses SchemaField.__schema__() <- where you propose to add the docstring

(if FancyScore would also define __schema__, it would not change that general logic and problem)

If you think that __schema__ should have priority over the docstring, then a different approach is needed - in that case we need to recognize where the description came from - was it overridden in the same class inside __schema__, or was it inherited from a base class? If it was inherited, the docstring of the current class should provide the default.

Without changing the general machinery, this would only be achievable ad-hoc by modifying all __schema__ methods. Actually, not sure how this could be done most elegantly without sandwiching some other method into the cascade or introducing metaclass magic (like: check __schema__ of the base class and the new class, compare description - if it changed/was not present before, then it was overriden. If it was not overridden, use docstring)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for the detailed description of the problem, I didn't have all the pieces in my head. I'll have to tinker with it a bit to see if I can come up with something. Otherwise I'll probably be reluctant to introduce this feature, unfortunately.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, no problem! I think the Meta class approach could work, in principle.

When a new class is subclassed:

  • no docstring -> all good
  • docstring, no defined schema method -> generate schema method with docstring as description
  • docstring, defines schema method -> call both inherited and new method to see if description (if any) is "new", if not there or not new, wrap method to add the docstring to the result.

That way it would have the desired behavior, but it's up to you to decide if this is an acceptable solution. The last case is "ugly", but still it's a check needed only once during class creation, and it could (I actually expect it to) be a rare case that both things are present.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long wait here, I have now looked closer and your assessment is probably correct, however, that makes me not want to introduce this feature as-is.

I think an alternative, if you think it's worth pursuing, would be introduce a class parameter. This is what I think the spec should be:

class Base(str, Phantom):
    @classmethod
    def __schema__(cls) -> Schema:
        return super().__schema__() | {"description": "base description"}

class A(Base):
    """docstring"""

    @classmethod
    def __schema__(cls) -> Schema:
        return super().__schema__() | {"description": "explicit override"}

class B(Base, docstring_description=True):
    """docstring"""

    @classmethod
    def __schema__(cls) -> Schema:
        return super().__schema__() | {"description": "explicit override"}

class C(Base):
    """docstring"""

class Base2(str, Phantom, docstring_description=True):
    """base docstring"""

class D(Base2):
    """docstring"""
    
class DataModel(pydantic.BaseModel):
    a: A
    b: B
    c: C
    d: D

assert DataModel.schema()["properties"]["a"]["description"] == "explicit override"
assert DataModel.schema()["properties"]["b"]["description"] == "docstring"
assert DataModel.schema()["properties"]["c"]["description"] == "base description"
assert DataModel.schema()["properties"]["d"]["description"] == "base docstring"

What do you think? use_docstring might be a better name for the parameter.

Copy link
Author

@apirogov apirogov Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So your modification (over what I proposed) is: instead of introducing the complicated checks and comparisons, to ask the user to opt-in explicitly, which would make the class docstring override any inherited or provided schema description, right?

I would be fine with that, especially if it's called use_docstring - it would still reduce a lot of boilerplate :)

@apirogov
Copy link
Author

apirogov commented Jan 2, 2023

@antonagestam

I updated the PR with the flag-based version. After playing around a bit I decided that when use_docstring is set, providing a docstring is mandatory, as otherwise the behavior is unintuitive. So it works like this:

from __future__ import annotations

from phantom import Phantom
from phantom.schema import Schema
import pydantic

class Base(str, Phantom, predicate=lambda _: True):
    @classmethod
    def __schema__(cls) -> Schema:
        return {
            **super().__schema__(),  # type: ignore
            "description": "base description"
        }

class A(Base):
    """docstring a"""

    @classmethod
    def __schema__(cls) -> Schema:
        return {
            **super().__schema__(),  # type: ignore
            "description": "explicit override a"
        }

class B(Base, use_docstring=True):
    """docstring b"""

    @classmethod
    def __schema__(cls) -> Schema:
        return {
            **super().__schema__(),  # type: ignore
            "description": "explicit override b"
        }


class C(B):
    """docstring c"""

    @classmethod
    def __schema__(cls) -> Schema:
        return {
            **super().__schema__(),  # type: ignore
            "description": "explicit override c"
        }

class D(C, use_docstring=False):
    """docstring d"""

    @classmethod
    def __schema__(cls) -> Schema:
        return {
            **super().__schema__(),  # type: ignore
            "description": "explicit override d"
        }
class DataModel(pydantic.BaseModel):

    a: A
    b: B
    c: C
    d: D

x = DataModel.schema()["properties"]
for k in ["a", "b", "c", "d"]:
    print(x[k]["description"])

assert x["a"]["description"] == "explicit override a"
assert x["b"]["description"] == "docstring b"
assert x["c"]["description"] == "docstring c"
assert x["d"]["description"] == "explicit override d"

What do you think?

@antonagestam
Copy link
Owner

@apirogov Initial reaction is it feels unintuitive for this to be inherited, ie I think C shouldn't use its docstring. I'll give this some further thought and come back with a proper review as soon as I find the time.

@apirogov
Copy link
Author

apirogov commented Jan 3, 2023

In my case, there could be many types defined in that way, so a way to set this flag to "infect" all subclasses or toggle this globally would be ideal.

But I agree that all unintuitive effects seem to be connected to expectations and possible behaviors wrt. to inheritance.

Maybe you are right to not like it, I am not sure myself. In any case this would be just a minor change to disable. Let me know if you have some better alternative idea, or I can also remove the inheritance bit :)

Copy link
Owner

@antonagestam antonagestam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments that need to be addressed, and the test suite is currently failing, please make sure it's passing and I can take another round of reviewing this.

In addition to that I think we need to add tests for this feature, that will also help me in understanding how your implementation works.


With the regards to the chosen solution, I still feel like the spec I proposed makes more sense. Perhaps we should discuss in more detail what you found unintuitive about that?

Comment on lines +137 to +138
if kwargs:
raise RuntimeError("Unknown phantom type argument(s): {kwargs}")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: I don't think we should add this check, any arguments falling through will be handled by the super call.

Copy link
Author

@apirogov apirogov Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I personally added it after wasting 10 minutes trying to understand why my keyword argument was not taken and it turned out that I had a typo - because the default error by the super call is absolutely not helpful and says (.... takes no keword arguments...) needlessly confusing me.

As the immediate super class does not have an init_subclass method, I decided to put it where it is to prevent passing unknown keywords to the default implementaion and getting that error.

I personally believe that such a check improves ergonomy over relying on some method upstream choke on it, but this is my personal preference and opinion. If you strongly disagree, I can remove it.

src/phantom/_base.py Outdated Show resolved Hide resolved
src/phantom/schema.py Outdated Show resolved Hide resolved
src/phantom/schema.py Outdated Show resolved Hide resolved
src/phantom/schema.py Outdated Show resolved Hide resolved
src/phantom/schema.py Outdated Show resolved Hide resolved
src/phantom/schema.py Outdated Show resolved Hide resolved
Comment on lines 59 to 61
if cls.__use_docstring__:
if ds := desc_from_docstring(cls):
field_schema["description"] = ds
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Can we move this to __schema__? I'd like to avoid adding logic here.

Copy link
Author

@apirogov apirogov Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can't. If the docstring is to take precedence over what overrides of __schema__ do, it must happen at the call site of __schema__, i.e. __modify_schema__, which is what we already discussed in the initial version I pushed.

Even worse, you still might want to change other fields and thus define __schema__ to set eg another title or add examples, but the description should come from the docstring, patched over it. So it must be outside.

In your example C, I might still want docstring c as description but add other things in __schema__. If I put this into the default implementation, this whole thing cannot work. Basically see my call stack explanation above.

src/phantom/schema.py Outdated Show resolved Hide resolved
src/phantom/schema.py Outdated Show resolved Hide resolved
@apirogov
Copy link
Author

apirogov commented Jan 4, 2023

Concerning your proposal, I simply did not see how intended behavior with inheritance was supposed to be and opted for it. I will remove it.

At first I found your example D unintuitive, but if the intent is to pass use_docstring on each subclass, now I see your intuition of it. Will try to adapt it accordingly.

@apirogov
Copy link
Author

apirogov commented Jan 4, 2023

@antonagestam

I pushed new changes. I made the code work to conform to your interface but there are some problems with how you wish your example D should work vs how the behavior in E turns out and is hard to fix. I'd suggest to change the behavior of D (assertion E shows what I expect, but currently it's violated)

from __future__ import annotations

from phantom import Phantom
from phantom.schema import Schema
import pydantic

class Base(str, Phantom, predicate=lambda _: True):
    @classmethod
    def __schema__(cls) -> Schema:
        return {
            **super().__schema__(),  # type: ignore
            "description": "base description"
        }

class A(Base):
    """docstring"""

    @classmethod
    def __schema__(cls) -> Schema:
        return {
            **super().__schema__(),  # type: ignore
            "description": "explicit override"
        }

class B(Base, use_docstring=True):
    """docstring"""

    @classmethod
    def __schema__(cls) -> Schema:
        return {
            **super().__schema__(),  # type: ignore
            "description": "explicit override"
        }


class C(Base):
    """docstring"""

class Base2(str, Phantom, use_docstring=True, predicate=lambda _: True):
    """base docstring"""

class D(Base2):
    """docstring"""

class Base3(str, Phantom, use_docstring=True, predicate=lambda _: True):
    """base docstring"""

    @classmethod
    def __schema__(cls) -> Schema:
        return {
            **super().__schema__(),  # type: ignore
            "description": "base description"
        }

class E(Base3):
    """docstring"""


class DataModel(pydantic.BaseModel):
    a: A
    b: B
    c: C
    d: D
    e: E

x = DataModel.schema()["properties"]
assert x["a"]["description"] == "explicit override"
assert x["b"]["description"] == "docstring"
assert x["c"]["description"] == "base description"
assert x["d"]["description"] == "base docstring"

# Problem:
assert x["e"]["description"] == "base docstring"  # would be my intuition
# but making sure that D works as you say makes it not possible for E to work correctly
# there is, again, no way to check whether the description was overridden in E
# or in __schema__ of a parent class.

I think instead of trying ugly things to make E work, rather I would change the behavior so that

  • in case D, there is no description at all (use_docstring acts only locally)
  • in case E, there is "base description" as description (from the __schema__ method)

then it would be consistent and the fixup method could be simplified / removed.

If we agree on the implementation and behavior details I would add tests and fix the pipelines and remove dead code/comments and rebase and squash everything for merge (so you don't have to point it out).

By the way, the | syntax for Union is also not supported in 3.7, but you are using it in many places. I am slightly bewildered by this. Does this not cause any problems?

@antonagestam
Copy link
Owner

I'll close this for now until there's a more clear idea about how to move forward with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use docstring as schema description
2 participants