Adding metadata schema to the code base itself #7409

ericspod · 2024-01-22T13:16:29Z

Description

This adds the schema file into the code base (but this maybe should be elsewhere). The changes implement a number of new things:

Moved definitions into a $defs section per the JSON schema standard
Permits multiple input arguments and return results from networks with arbitrary names using the patternProperties mechanism
Allows the types of inputs and outputs to be, additional to just tensors, numbers, booleans, or strings
Outputs after post processing can be specified with the post_processed_outputs section if they are significantly changed with the post-process transforms defined in scripts
Multiple network IO formats can be specified in addition to network_data_format, these must follow the pattern <name>_data_format
required_packages_version added in addition to optional_packages_version

#7253 depends on this schema change.

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
In-line docstrings updated.
Documentation updated, tested make html command in the docs/ folder.

Signed-off-by: Eric Kerfoot <[email protected]>

for more information, see https://pre-commit.ci

ericspod · 2024-01-22T13:21:18Z

Should this file exist here or put elsewhere? Currently the schema is stored in extra-test-data within a release, but it has to be somewhere accessible through a URL in the metadata.json files. #4048 references the issue of making this URL accessible.

KumoLiu · 2024-01-22T14:32:06Z

Should this file exist here or put elsewhere? Currently the schema is stored in extra-test-data within a release, but it has to be somewhere accessible through a URL in the metadata.json files. #4048 references the issue of making this URL accessible.

Yes, just make this draft PR to help review and leave comments. After finished we can move it in extra-test-data or somewhere else.

yiheng-wang-nv · 2024-01-24T09:38:55Z

monai/bundle/meta_schema.json

@@ -0,0 +1,218 @@
+{
+ "$schema": "https://json-schema.org/draft/2019-09/schema",


Hi @ericspod , for existing bundles in model zoo, will this new schema impact them?

I ran the schema for existing bundles, the ones that it failed on were the new generative bundles which didn't have a network_data_format and so are not compliant with the current schema anyway. The open question about requiring this field or not needs resolving here, if we don't make it required then all metadata.json files pass.

My opinion on this one is that instead of renaming and replying on the pattern property, I think these two generative models missing the network_data_format, so we should add them. network_data_format should still be a required field.
https://github.com/Project-MONAI/model-zoo/blob/e99e61aeb6fd21fc8613fd6099463ca8d7420aca/models/brats_mri_axial_slices_generative_diffusion/configs/train_diffusion.json#L18C6-L18C17
What do you think?

I think it should stay required so there's an assumption about which network is the "main" network for a bundle and can always be found. It should be a minor tweak to the generative bundles to make them conform.

One thing that is different that we can easily add is that required_packages_version and optional_packages_version are both required. I think existing bundles should rename optional_packages_version to required_packages_version in their schemas if we want these properties to be required.

yiheng-wang-nv · 2024-01-24T09:44:32Z

Can we prepare a test case, like using 1) this schema and 2) prepare a fake metadata.json file to use verify_metadata in monai.bundle.scripts to verify that it works?

KumoLiu · 2024-01-24T14:48:21Z

monai/bundle/meta_schema.json

+ "pytorch_version",
+ "numpy_version",
+ "required_packages_version",
+ "optional_packages_version",


Since we already know all of the packages we need after the bundle was created, we could remove 'optional_packages_version', which may be confusing, and instead, directly use 'required_packages_version'. What do you think?
cc @Nic-Ma @yiheng-wang-nv

My idea was that required_packages_version is for additional packages this bundle absolutely required which wouldn't be present with a basic MONAI install, eg. nibabel which would be present only if the option was specified. optional_packages_version are ones which the bundle uses but doesn't strictly need.

Yes, I know your points. But after bundle created, the creator needs to determine which packages are optional and then finished metadata.json. For users, even if the user knows that a package is optional, and they want to use the cheapest version, they still need to modify the config file themselves, so does an optional package still make sense for the bundle? Just wondering if it's necessary to complicate this, I'm open to any opinions.

It's not necessary that the creator or the user has to modify anything. We have networks, transforms, and other components which check optional dependencies and will work fine if one isn't present. As an example, I'm working on a bundle which, in code in the scripts directory, optionally uses cupy but will resort to numpy if that isn't present.

We may need to clearly describe what the "optional package" means here to avoid confusion. I was also confused at the first look..

Thanks.

I've updated the schema description line to be a bit clearer. We would also document in docs and wherever else relevant that the required packages are absolutely needed by the bundle, but the optional packages enable features or capabilities not strictly needed but may allow broader behaviours or faster performance. Eg. if an operation somewhere can use cupy to accelerate something but will default to numpy if this isn't present then cupy would be an optional package, or nibabel would be optional if the bundle can work with just numpy array loading and not need nifti loading.

ericspod · 2024-01-24T15:11:09Z

Can we prepare a test case, like using 1) this schema and 2) prepare a fake metadata.json file to use verify_metadata in monai.bundle.scripts to verify that it works?

Yes I can put that together shortly.

Signed-off-by: Eric Kerfoot <[email protected]>

for more information, see https://pre-commit.ci

ericspod · 2024-02-03T00:37:07Z

Can we prepare a test case, like using 1) this schema and 2) prepare a fake metadata.json file to use verify_metadata in monai.bundle.scripts to verify that it works?

@yiheng-wang-nv I've added a notebook demonstrating this now. We won't want to merge this into dev, it's just a demo as the schema file is.

Signed-off-by: Eric Kerfoot <[email protected]>

ericspod · 2024-02-24T21:03:21Z

Outstanding discussion points:

Should required_packages_version and/or optional_packages_version be required at all? If required_packages_version is required then model zoo bundles will need updating.
Should we allow multiple network formats with added [NAME]_data_format blocks but require there to always be a network_data_format block? The diffusion models in the zoo will need fixing if so.

Signed-off-by: Eric Kerfoot <[email protected]>

for more information, see https://pre-commit.ci

ericspod · 2024-03-25T20:50:46Z

I've made required_packages_version mandatory but not optional_packages_version. We'll keep the [NAME]_data_format content of the schema. I've added documentation about other details as well.

Adding metadata schema to the code base itself

1e13221

Signed-off-by: Eric Kerfoot <[email protected]>

ericspod requested review from Nic-Ma and KumoLiu January 22, 2024 13:16

[pre-commit.ci] auto fixes from pre-commit.com hooks

5fe3524

for more information, see https://pre-commit.ci

ericspod mentioned this pull request Jan 22, 2024

Update the schema for MONAI Bundle #7303

Open

KumoLiu requested a review from yiheng-wang-nv January 23, 2024 04:16

Merge branch 'dev' into bundle_schema

0c9bf11

yiheng-wang-nv reviewed Jan 24, 2024

View reviewed changes

KumoLiu reviewed Jan 24, 2024

View reviewed changes

ericspod and others added 2 commits February 3, 2024 00:35

Adding notebook testing validation

85c0d88

Signed-off-by: Eric Kerfoot <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

f8f5d7a

for more information, see https://pre-commit.ci

ericspod and others added 2 commits February 3, 2024 00:37

Merge branch 'dev' into bundle_schema

2e6fa53

Update schema description

b46bb5b

Signed-off-by: Eric Kerfoot <[email protected]>

ericspod and others added 3 commits March 25, 2024 20:49

Update docs and schema based on conversation

aee04f6

Signed-off-by: Eric Kerfoot <[email protected]>

Merge branch 'dev' into bundle_schema

0c37824

[pre-commit.ci] auto fixes from pre-commit.com hooks

1174ea0

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding metadata schema to the code base itself #7409

Adding metadata schema to the code base itself #7409

ericspod commented Jan 22, 2024

ericspod commented Jan 22, 2024

KumoLiu commented Jan 22, 2024

yiheng-wang-nv Jan 24, 2024

ericspod Jan 24, 2024

KumoLiu Jan 24, 2024

ericspod Jan 24, 2024

ericspod Feb 3, 2024

yiheng-wang-nv commented Jan 24, 2024

KumoLiu Jan 24, 2024

ericspod Jan 24, 2024

KumoLiu Jan 24, 2024

ericspod Jan 24, 2024

Nic-Ma Jan 24, 2024

ericspod Feb 24, 2024

ericspod commented Jan 24, 2024

ericspod commented Feb 3, 2024

ericspod commented Feb 24, 2024

ericspod commented Mar 25, 2024

		@@ -0,0 +1,218 @@
		{
		"$schema": "https://json-schema.org/draft/2019-09/schema",

Adding metadata schema to the code base itself #7409

Are you sure you want to change the base?

Adding metadata schema to the code base itself #7409

Conversation

ericspod commented Jan 22, 2024

Description

Types of changes

ericspod commented Jan 22, 2024

KumoLiu commented Jan 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiheng-wang-nv commented Jan 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericspod commented Jan 24, 2024

ericspod commented Feb 3, 2024

ericspod commented Feb 24, 2024

ericspod commented Mar 25, 2024