-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding metadata schema to the code base itself #7409
base: dev
Are you sure you want to change the base?
Conversation
Signed-off-by: Eric Kerfoot <[email protected]>
for more information, see https://pre-commit.ci
Should this file exist here or put elsewhere? Currently the schema is stored in |
Yes, just make this draft PR to help review and leave comments. After finished we can move it in |
@@ -0,0 +1,218 @@ | |||
{ | |||
"$schema": "https://json-schema.org/draft/2019-09/schema", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ericspod , for existing bundles in model zoo, will this new schema impact them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the schema for existing bundles, the ones that it failed on were the new generative bundles which didn't have a network_data_format
and so are not compliant with the current schema anyway. The open question about requiring this field or not needs resolving here, if we don't make it required then all metadata.json files pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion on this one is that instead of renaming and replying on the pattern property, I think these two generative models missing the network_data_format
, so we should add them. network_data_format
should still be a required field.
https://github.com/Project-MONAI/model-zoo/blob/e99e61aeb6fd21fc8613fd6099463ca8d7420aca/models/brats_mri_axial_slices_generative_diffusion/configs/train_diffusion.json#L18C6-L18C17
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should stay required so there's an assumption about which network is the "main" network for a bundle and can always be found. It should be a minor tweak to the generative bundles to make them conform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that is different that we can easily add is that required_packages_version
and optional_packages_version
are both required. I think existing bundles should rename optional_packages_version
to required_packages_version
in their schemas if we want these properties to be required.
Can we prepare a test case, like using 1) this schema and 2) prepare a fake metadata.json file to use |
monai/bundle/meta_schema.json
Outdated
"pytorch_version", | ||
"numpy_version", | ||
"required_packages_version", | ||
"optional_packages_version", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we already know all of the packages we need after the bundle was created, we could remove 'optional_packages_version', which may be confusing, and instead, directly use 'required_packages_version'. What do you think?
cc @Nic-Ma @yiheng-wang-nv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea was that required_packages_version
is for additional packages this bundle absolutely required which wouldn't be present with a basic MONAI install, eg. nibabel
which would be present only if the option was specified. optional_packages_version
are ones which the bundle uses but doesn't strictly need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know your points. But after bundle created, the creator needs to determine which packages are optional and then finished metadata.json. For users, even if the user knows that a package is optional, and they want to use the cheapest version, they still need to modify the config file themselves, so does an optional package still make sense for the bundle? Just wondering if it's necessary to complicate this, I'm open to any opinions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary that the creator or the user has to modify anything. We have networks, transforms, and other components which check optional dependencies and will work fine if one isn't present. As an example, I'm working on a bundle which, in code in the scripts
directory, optionally uses cupy but will resort to numpy if that isn't present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to clearly describe what the "optional package" means here to avoid confusion. I was also confused at the first look..
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the schema description line to be a bit clearer. We would also document in docs and wherever else relevant that the required packages are absolutely needed by the bundle, but the optional packages enable features or capabilities not strictly needed but may allow broader behaviours or faster performance. Eg. if an operation somewhere can use cupy to accelerate something but will default to numpy if this isn't present then cupy would be an optional package, or nibabel would be optional if the bundle can work with just numpy array loading and not need nifti loading.
Yes I can put that together shortly. |
Signed-off-by: Eric Kerfoot <[email protected]>
for more information, see https://pre-commit.ci
@yiheng-wang-nv I've added a notebook demonstrating this now. We won't want to merge this into dev, it's just a demo as the schema file is. |
Signed-off-by: Eric Kerfoot <[email protected]>
Outstanding discussion points:
|
Signed-off-by: Eric Kerfoot <[email protected]>
for more information, see https://pre-commit.ci
I've made |
Fixes #7303 #6959.
Description
This adds the schema file into the code base (but this maybe should be elsewhere). The changes implement a number of new things:
$defs
section per the JSON schema standardpatternProperties
mechanismpost_processed_outputs
section if they are significantly changed with the post-process transforms defined in scriptsnetwork_data_format
, these must follow the pattern<name>_data_format
required_packages_version
added in addition tooptional_packages_version
#7253 depends on this schema change.
Types of changes
./runtests.sh -f -u --net --coverage
../runtests.sh --quick --unittests --disttests
.make html
command in thedocs/
folder.