Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement tool for saved Keras model file inspection, diff, and patching. #19705

Open
pmasousa opened this issue May 10, 2024 · 6 comments
Open
Assignees
Labels
stat:contributions welcome A pull request to fix this issue would be welcome. type:feature The user is asking for a new feature.

Comments

@pmasousa
Copy link

Hello!
I saw this feature on "馃殌 Contributing to Keras 馃殌" and I want to know If I can start developing it.
The tool can:

  • Take a fname.keras file and display the manifest of its contents (including weights file structure)
  • Take a fname.weights.h5 and display the manifest of its content.
  • Diff two weights files, highlighting what's in one and not in the other.
  • Patch a file, by replacing a given weight with a different value provided by the user.
@sachinprasadhs sachinprasadhs added type:feature The user is asking for a new feature. keras-team-review-pending Pending review by a Keras team member. labels May 10, 2024
@fchollet
Copy link
Member

Sure, you are welcome to work on that. Do you have any experience with web development? I'm thinking this tool may benefit from an interactive js/html interface to be used in a notebook.

@pmasousa
Copy link
Author

I'm in the third year of computer science and engineering, so we already had to use JS and HTML for some projects, and I already have some knowledge of web development from some side projects I have done and am currently doing. I will also be doing this with @pedro-curto, who is in the same year and university as I am.
I also agree that this tool would be better with an interface so I'll be glad to do it if you agree.

@fchollet
Copy link
Member

Sounds great.

Here's an example of a draft I wrote a long time ago, displaying a kind of summary of a file's content:

def inspect_file(
    filepath, reference_model=None, custom_objects=None, print_fn=print
):
    filepath = str(filepath)
    if filepath.endswith(".keras"):

        with zipfile.ZipFile(filepath, "r") as zf:
            print_fn(f"Keras model file '{filepath}'")

            with zf.open(_CONFIG_FILENAME, "r") as f:
                config = json.loads(f.read())
                print_fn(
                    f"Model: {config['class_name']} name='{config['config']['name']}'"
                )
            if reference_model is None:
                reference_model = deserialize_keras_object(
                    config, custom_objects=custom_objects
                )

            with zf.open(_METADATA_FILENAME, "r") as f:
                metadata = json.loads(f.read())
                print_fn(f"Saved with Keras {metadata['keras_version']}")
                print_fn(f"Date saved: {metadata['date_saved']}")

            archive = zipfile.ZipFile(filepath, "r")
            weights_store = H5IOStore(
                _VARS_FNAME + ".h5", archive=archive, mode="r"
            )
            print_fn("Weights file:")
            inspect_nested_dict(weights_store.h5_file, print_fn, prefix="    ")

    elif filepath.endswith(".weights.h5"):
        print_fn(f"Keras weights file '{filepath}'")
        weights_store = H5IOStore(
            _VARS_FNAME + ".h5", archive=archive, mode="r"
        )
        inspect_nested_dict(weights_store.h5_file, print_fn)

    else:
        raise ValueError(
            "Invalid filename: expected a `.keras` `.weights.h5` extension. "
            f"Received: filepath={filepath}"
        )


def inspect_nested_dict(store, print_fn=print, prefix=""):
    for key in store.keys():
        value = store[key]

        if hasattr(value, "keys"):
            skip = False
            if (
                list(value.keys()) == ["vars"]
                and len(value["vars"].keys()) == 0
            ):
                skip = True
            if key == "vars" and len(value.keys()) == 0:
                skip = True
            if not skip:
                print_fn(f"{prefix}{key}")
                inspect_nested_dict(value, print_fn, prefix=prefix + "    ")
                if key == "vars":
                    for k in value.keys():
                        w = value[k]
                        print_fn(f"{prefix}    {k}: {w.shape} {w.dtype}")

(It relies on objects from keras/src/saving/saving_lib.py, like H5IOStore).

I think we want the following features:

  1. Show the contents of a file, down to visualizing weight variables as a series of color grids. The interface should make this easy: at first you only see the list of top-level layers, but you can click on any of them to expand their contents, etc. Finally you can click on a weight tensor to visualize it. HTML+JS is a great fit for this, compared to the command line.
  2. Show the diff compared to a reference_model. Highlight any incompatibilities or differences between the saved file contents and the structure of the reference model.
  3. Offer a way to rename a weight or layer, or delete one, or add one -- saving a new edited file as a result. This could be done with an interactive interface.

What do you think?

@pmasousa
Copy link
Author

That sounds fantastic!
The template already helps a lot. Can I ask you if I have any further questions?
Can you add @pedro-curto as a participant to this issue?

@fchollet
Copy link
Member

Sure, you can just ask questions in this thread.

@sachinprasadhs sachinprasadhs added stat:contributions welcome A pull request to fix this issue would be welcome. and removed keras-team-review-pending Pending review by a Keras team member. labels May 13, 2024
@pmasousa
Copy link
Author

pmasousa commented May 20, 2024

Hi @fchollet,
We would like some feedback on our current data visualization and structure.
Here is a collab with what we have so far, regarding the first functionality, along with some test scripts to visualize the output.
Is this what you had in mind?
Additionally, we're struggling to find a robust way to handle aspect ratios for the graphs so that they work well with all types of dimensions. Do you have any suggestions or best practices for this?

Thank you very much for your assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:contributions welcome A pull request to fix this issue would be welcome. type:feature The user is asking for a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants