Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding metadata schema to the code base itself #7409

Draft
wants to merge 10 commits into
base: dev
Choose a base branch
from

Conversation

ericspod
Copy link
Member

Fixes #7303 #6959.

Description

This adds the schema file into the code base (but this maybe should be elsewhere). The changes implement a number of new things:

  • Moved definitions into a $defs section per the JSON schema standard
  • Permits multiple input arguments and return results from networks with arbitrary names using the patternProperties mechanism
  • Allows the types of inputs and outputs to be, additional to just tensors, numbers, booleans, or strings
  • Outputs after post processing can be specified with the post_processed_outputs section if they are significantly changed with the post-process transforms defined in scripts
  • Multiple network IO formats can be specified in addition to network_data_format, these must follow the pattern <name>_data_format
  • required_packages_version added in addition to optional_packages_version

#7253 depends on this schema change.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@ericspod
Copy link
Member Author

Should this file exist here or put elsewhere? Currently the schema is stored in extra-test-data within a release, but it has to be somewhere accessible through a URL in the metadata.json files. #4048 references the issue of making this URL accessible.

@KumoLiu
Copy link
Contributor

KumoLiu commented Jan 22, 2024

Should this file exist here or put elsewhere? Currently the schema is stored in extra-test-data within a release, but it has to be somewhere accessible through a URL in the metadata.json files. #4048 references the issue of making this URL accessible.

Yes, just make this draft PR to help review and leave comments. After finished we can move it in extra-test-data or somewhere else.

@@ -0,0 +1,218 @@
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ericspod , for existing bundles in model zoo, will this new schema impact them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the schema for existing bundles, the ones that it failed on were the new generative bundles which didn't have a network_data_format and so are not compliant with the current schema anyway. The open question about requiring this field or not needs resolving here, if we don't make it required then all metadata.json files pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion on this one is that instead of renaming and replying on the pattern property, I think these two generative models missing the network_data_format, so we should add them. network_data_format should still be a required field.
https://github.com/Project-MONAI/model-zoo/blob/e99e61aeb6fd21fc8613fd6099463ca8d7420aca/models/brats_mri_axial_slices_generative_diffusion/configs/train_diffusion.json#L18C6-L18C17
What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should stay required so there's an assumption about which network is the "main" network for a bundle and can always be found. It should be a minor tweak to the generative bundles to make them conform.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that is different that we can easily add is that required_packages_version and optional_packages_version are both required. I think existing bundles should rename optional_packages_version to required_packages_version in their schemas if we want these properties to be required.

@yiheng-wang-nv
Copy link
Contributor

Can we prepare a test case, like using 1) this schema and 2) prepare a fake metadata.json file to use verify_metadata in monai.bundle.scripts to verify that it works?

"pytorch_version",
"numpy_version",
"required_packages_version",
"optional_packages_version",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already know all of the packages we need after the bundle was created, we could remove 'optional_packages_version', which may be confusing, and instead, directly use 'required_packages_version'. What do you think?
cc @Nic-Ma @yiheng-wang-nv

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was that required_packages_version is for additional packages this bundle absolutely required which wouldn't be present with a basic MONAI install, eg. nibabel which would be present only if the option was specified. optional_packages_version are ones which the bundle uses but doesn't strictly need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know your points. But after bundle created, the creator needs to determine which packages are optional and then finished metadata.json. For users, even if the user knows that a package is optional, and they want to use the cheapest version, they still need to modify the config file themselves, so does an optional package still make sense for the bundle? Just wondering if it's necessary to complicate this, I'm open to any opinions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary that the creator or the user has to modify anything. We have networks, transforms, and other components which check optional dependencies and will work fine if one isn't present. As an example, I'm working on a bundle which, in code in the scripts directory, optionally uses cupy but will resort to numpy if that isn't present.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to clearly describe what the "optional package" means here to avoid confusion. I was also confused at the first look..

Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the schema description line to be a bit clearer. We would also document in docs and wherever else relevant that the required packages are absolutely needed by the bundle, but the optional packages enable features or capabilities not strictly needed but may allow broader behaviours or faster performance. Eg. if an operation somewhere can use cupy to accelerate something but will default to numpy if this isn't present then cupy would be an optional package, or nibabel would be optional if the bundle can work with just numpy array loading and not need nifti loading.

@ericspod
Copy link
Member Author

Can we prepare a test case, like using 1) this schema and 2) prepare a fake metadata.json file to use verify_metadata in monai.bundle.scripts to verify that it works?

Yes I can put that together shortly.

@ericspod
Copy link
Member Author

ericspod commented Feb 3, 2024

Can we prepare a test case, like using 1) this schema and 2) prepare a fake metadata.json file to use verify_metadata in monai.bundle.scripts to verify that it works?

@yiheng-wang-nv I've added a notebook demonstrating this now. We won't want to merge this into dev, it's just a demo as the schema file is.

@ericspod
Copy link
Member Author

Outstanding discussion points:

  • Should required_packages_version and/or optional_packages_version be required at all? If required_packages_version is required then model zoo bundles will need updating.
  • Should we allow multiple network formats with added [NAME]_data_format blocks but require there to always be a network_data_format block? The diffusion models in the zoo will need fixing if so.

@ericspod
Copy link
Member Author

I've made required_packages_version mandatory but not optional_packages_version. We'll keep the [NAME]_data_format content of the schema. I've added documentation about other details as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update the schema for MONAI Bundle
4 participants