Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve typing of the metadata parameter #1490

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

JeanChristopheMorinPerso
Copy link
Member

@JeanChristopheMorinPerso JeanChristopheMorinPerso commented Dec 4, 2022

This PR is part of the series of PRs I have created to improve the types exposed by the Python bindings.

More particularly, this PR improves the metadata parameter type. But this is much easier said than done.

Note that I'm not fluent in C++, so it's entirely possible that I made some novice mistakes or that I did some bad things. I'll be more than happy to get feedback and learn new things.

Changes

  • Introduction of a new custom Pybind11 type caster that allows to expose an AnyDictionaryProxy that can both be an AnyDictionaryProxy and py::dict (this allows to keep compatibility with the previous implementation). It also handles recursively converting the types of the sub elements.

    This new caster is kind of special because like I said, it will handle both AnyDictionaryProxy and py::dict. It's also special because it will render the AnyDictionaryProxy type in Python docstrings as Metadata.

  • Replaced py_to_any_dictionary with a pure C++ implementation, which removes the awkward Python > C++ > Python > C++ > Python round trip that it was doing to convert a py::object into an AnyDictionary.

  • New opentimelineio.core.Metadata and opentimelineio.core.MetadataValue type aliases (defined in Python) that allows to have types clearly defined in Python. It's also nicely exposed in the Sphinx docs. This part is still pretty WIP. It kind of works but mypy isn't happy with it and it also requires a dirty hack for Sphinx.

  • Added ability to construct AnyDictionary and AnyVector objects with values in Python. This means it's now possible to do AnyDictionary({'asd': 1234}). This is not strictly required for this PR, but since I was in the code for these and I always found it annoying that I couldn't do that, I thought I should just do it here.

Side effects

I found no noticeable side effects, except that it fixes #1474.

I'm planning on running https://github.com/bloomberg/memray to see if there is a difference in memory usage. It'd be nice if some people could try on big production OTIO files. That would actually also help to confirm that I didn't introduce a regression in functionality and behavior.

I'm a little bit worried about how the values are shared between C++ and Python, but so far found no evidence that any of this changed with my changes. For example:

>>> so = opentimelineio._otio.SerializableObjectWithMetadata(metadata={'asd': 'asd'})
>>> so.metadata
{'asd': 'asd'}
>>> 
>>> d = so.metadata
>>> d['key'] = 'value'
>>> so.metadata
{'asd': 'asd', 'key': 'value'}

So this still works and

>>> d = {}
>>> so = opentimelineio._otio.SerializableObjectWithMetadata(metadata=d)
>>> so.metadata
{}
>>> d['key'] = 'value'
>>> so.metadata
{}

still behaves as before. I still have to try modifying the metadata from the C++ side though.

Explanations

I have already written on the subject, see https://gist.github.com/JeanChristopheMorinPerso/0ad46432bf2c251b920c2bf648a86fe9#pyobject-as-parameter-type-in-class-signatures.

Previously, the metadata was exposed as py::object in constructors:

py::class_<SOWithMetadata, SerializableObject,
               managing_ptr<SOWithMetadata>>(m, "SerializableObjectWithMetadata", py::dynamic_attr())

        // Notice the py::object
        .def(py::init([](std::string name, py::object metadata) {
                    return new SOWithMetadata(name, py_to_any_dictionary(metadata));
                }),
            py::arg_v("name"_a = std::string()),
            py::arg_v("metadata"_a = py::none()));

The main problem with this practice is that the signature becomes:

__init__(...)
    __init__(name: str = '', metadata: object = None) -> None

It's basically impossible to know what kind of type the metadata should be just by looking at the signature. But there is another thing to note about the previous approach. py_to_any_dictionary was doing a round-trip to python to do the type conversions! The round-trip looked like: Python (constructing the SerializableObjectWithMetadata object) > C++ (we are now in pybind11) > Python (py_to_any_dictionary calls opentimelineio._core._core_utils._value_to_any) > C++ (_value_to_any would create PyAny objects) > Python (PyAny returned to Python) > C++ (py_to_any_dictionary gets the return value from _value_to_any) > Python (Pybind11 does its thing and returns a SerializableObjectWithMetadata to Python).

That's pretty hacky and it was only to convert types. All this round tripping can be entirely replaced with a pure C++ approach.

With that, we now have:

py::class_<SOWithMetadata, SerializableObject,
               managing_ptr<SOWithMetadata>>(m, "SerializableObjectWithMetadata", py::dynamic_attr())

        // Notice that we now accept an AnyDictionaryProxy
        .def(py::init([](std::string name, AnyDictionaryProxy* metadata) {
                    return new SOWithMetadata(name, metadata->fetch_any_dictionary());
                }),
            py::arg_v("name"_a = std::string()),
            py::arg_v("metadata"_a = py::dict())); // Note the default is now an empty dict

This now renders like this in Python:

__init__(...)
    __init__(name: str = '', metadata: opentimelineio.core.Metadata = {}) -> None

Even with this change, there is still one round-trip in the code base. And that remaining round-trip will be pretty hard to get rid of (at least for me. I guess it would be much easier for a C++ developer). The round-trip is basically https://github.com/AcademySoftwareFoundation/OpenTimelineIO/blob/main/src/py-opentimelineio/opentimelineio/core/_core_utils.py#L121. AnyDictionaryProxy C++ class (not the binding) wants PyAny objects, see https://github.com/AcademySoftwareFoundation/OpenTimelineIO/blob/main/src/py-opentimelineio/opentimelineio-bindings/otio_anyDictionary.h#L65. I tried to change that so that it takes Something else, but I couldn't come up with something that worked.

Signed-off-by: Jean-Christophe Morin <[email protected]>
Signed-off-by: Jean-Christophe Morin <[email protected]>
Signed-off-by: Jean-Christophe Morin <[email protected]>
Signed-off-by: Jean-Christophe Morin <[email protected]>
Signed-off-by: Jean-Christophe Morin <[email protected]>
Signed-off-by: Jean-Christophe Morin <[email protected]>
Signed-off-by: Jean-Christophe Morin <[email protected]>
Signed-off-by: Jean-Christophe Morin <[email protected]>
… to convert types

Signed-off-by: Jean-Christophe Morin <[email protected]>
Signed-off-by: Jean-Christophe Morin <[email protected]>
…xy or a py::dict

Signed-off-by: Jean-Christophe Morin <[email protected]>
Signed-off-by: Jean-Christophe Morin <[email protected]>
Signed-off-by: Jean-Christophe Morin <[email protected]>
Signed-off-by: Jean-Christophe Morin <[email protected]>
Signed-off-by: Jean-Christophe Morin <[email protected]>
Signed-off-by: Jean-Christophe Morin <[email protected]>
@meshula
Copy link
Collaborator

meshula commented Mar 19, 2023

This diff is pretty large, and I realize it's still a draft. I think it might make sense to schedule in cooperation with the TSC, an office hours zoom to have a look at it and help things a long.

@JeanChristopheMorinPerso
Copy link
Member Author

Sure! I'll be happy to have a call with TSC members to do a walk through of the changes.

Also, it's marked as a draft because there is still a thing or two to verify and tweak. For example the Metadata class should be renamed to something else (probably Dict, or something like that). And I need to make sure this Metadata class works with mypy and co. I also want to profile the changes and compare with the old code.

I'll be actively working on that in the next 4 weeks.

@JeanChristopheMorinPerso
Copy link
Member Author

I can also split the part that allows to construct AnyVector and AnyDictionary with values in a separate PR since it's unrelated to this PR.

@meshula
Copy link
Collaborator

meshula commented Mar 23, 2023

splitting the PR as you suggest would be a good idea.

@JeanChristopheMorinPerso
Copy link
Member Author

I extracted some bits into #1574. I'll update the PR once #1574 gets merged.

@JeanChristopheMorinPerso
Copy link
Member Author

I also created #1575 to split another part out.

Comment on lines +166 to +167
SerializableObject::Retainer<> r(py::cast<SerializableObject*>(o));
return create_safely_typed_any(r.take_value());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit that I'm not too sure how the downcast works here, but it works. This is taken from https://github.com/AcademySoftwareFoundation/OpenTimelineIO/blob/main/src/py-opentimelineio/opentimelineio-bindings/otio_bindings.cpp#L31-L40.

We'll need more tests to make sure that everything works as expected.

@@ -32,6 +38,17 @@
release_to_schema_version_map,
)

Metadata: TypeAlias = Union[Dict[str, 'MetadataValue'], AnyDictionary]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these to work, we'll have to ove them into a stub file, more specifically opentimelineio._otio.pyi. We'll have to also create stubs for our bindings, which can be automated using pybind11-stubgen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentation fault on pytest exit when creating a SerializableObjectWithMetadata
2 participants