New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add simplified model manager install API to InvocationContext #6132
base: main
Are you sure you want to change the base?
Conversation
9cc1f20
to
af1b57a
Compare
I have added a migration script that tidies up the |
537a626
to
3ddd7ce
Compare
3ddd7ce
to
fa6efac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what I was expecting the implementation to be, but it definitely wasn't as simple as this - great work.
I've requested a few changes and there's one discussion item that I'd like to marinate on before we change the public invocation API.
invokeai/app/services/shared/sqlite_migrator/migrations/migration_10.py
Outdated
Show resolved
Hide resolved
@psychedelicious I think all the issues are now addressed. Ok to approve and merge? |
Sorry, no we need to do the same pattern for that last processor, |
I'll take a look at it Friday. |
I've refactored |
I think we'd need to just update the installer script with special handling to uninstall those packages if they are already installed. It's probably time to revise our optional dependency lists. I think "cuda" and "cpu" make sense to be the only two user-facing options. "xformers" is extraneous now (torch's native SDP implementation is just as fast), so it could be removed. |
Thanks for cleaning up the pose detector. It would be nice to use the model context so we get memory mgmt, but that is a future task. I had some feedback from earlier about the public API that I think was lost:
If both of those args are removed, then Also, will these methods work for diffusers models? If so, "ckpt" probably doesn't need to be in the name. |
The onnxruntime model loading architecture seems to be very different from what the model manager expects. In particular, the
Right now the regex token handling is done in a part of the install manager that is not called by the simplifed API. I'll move this code into the core
I think you're saying this should be a global config option and I agree with that. Can we get the config migration code in so that I have a clean way of updating the config?
Not currently. It only works with checkpoints. I'd planned to add diffusers support later, but I guess I should do that now. Converting to draft. |
Probably doesn't make sense to spend time on the onnx loading. This is the only model that uses it.
Sounds good.
I don't think any migration is necessary - just add a sensible default value, maybe it should be 0 (no timeout). I'll check back in on the config migration PR this week.
Ok, thanks. |
I've played with this a bit. It is easy to load the openpose onnx sessions into the RAM cache and they will run happily under the existing MM cache system. However, Onnx sessions do their own internal VRAM/CUDA management, and so I found that for the duration of the time that the session object is in RAM, it holds on to a substantial chunk of VRAM (1.7GB). The openpose session is only used during conversion of an image into a pose model, and I think it's better to have slow disk-based loading of the openpose session than to silently consume a chunk of VRAM that interferes with later generation. |
@psychedelicious This is ready for your review now. There are now just two calls: |
Summary
This adds two model manager-related methods to the InvocationContext uniform API. They are accessible via
context.models.*
:load_and_cache_model(source: Path|str|AnyHttpURL, loader: Optional[Callable[[Path], Dict[str, Tensor]]] = None) -> LoadedModel
Load the model located at the indicated path, URL or repo_id.
This will download the model from the indicated location , cache it locally, and load it into the model manager RAM cache if needed. If the optional loader argument is provided, the loader will be invoked to load the model into memory. Otherwise the method will call
safetensors.torch.load_file()
ortorch.load()
(with a pickle scan) as appropriate to the file suffix. Diffusers models are supported via HuggingFace repo_ids.Be aware that the LoadedModel object will have a
config
attribute of None.Here is an example of usage:
download_and_cache_model( source: str | AnyHttpUrl, access_token: Optional[str] = None, timeout: Optional[int] = 0) -> Path
Download the model file located at source to the models cache and return its Path.
This will check
models/.download_cache
for the desired model file and download it from the indicated source if not already present. The local Path to the downloaded file is then returned.Other Changes
This PR performs a migration, in which it renames
models/.cache
tomodels/.convert_cache
, and migrates previously-downloaded ESRGAN, openpose, DepthAnything and Lama inpaint models from themodels/core
directory intomodels/.download_cache
.There are a number of legacy model files in
models/core
, such as GFPGAN, which are no longer used. This PR deletes them and tidies up themodels/core
directory.Related Issues / Discussions
I have systematically replaced all the calls to
download_with_progress_bar()
. This function is no longer used elsewhere and has been removed.QA Instructions
I have added unit tests for the three new calls. You may test that the
load_and_cache_model()
call is working by running the upscaler within the web app. On first try, you will see the model file being downloaded into the models.cache
directory. On subsequent tries, the model will either load from RAM (if it hasn't been displaced) or will be loaded from the filesystem.Merge Plan
Squash merge when approved.
Checklist