Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Describe integration with MLflow #3856

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Describe integration with MLflow #3856

wants to merge 3 commits into from

Conversation

astrojuanlu
Copy link
Member

See #3541.

Description

Development notes

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

@astrojuanlu
Copy link
Member Author

@astrojuanlu astrojuanlu force-pushed the docs/kedro-mlflow branch 2 times, most recently from a8256cf to df922b7 Compare May 7, 2024 12:51
Closes #3541.

Signed-off-by: Juan Luis Cano Rodríguez <[email protected]>
@astrojuanlu astrojuanlu marked this pull request as ready for review May 7, 2024 13:57
@astrojuanlu astrojuanlu requested a review from yetudada as a code owner May 7, 2024 13:57
@astrojuanlu
Copy link
Member Author

Note to reviewers:

The idea of this page is to complement kedro-mlflow documentation, and in fact it contains several references to it. What this adds then is:

  • References to MLflow in the Kedro docs so users can easily find it
  • Short examples on how to create custom integrations using hooks
  • Fill some gaps in kedro-mlflow

The idea is for this page to serve as brief collection of MLOps use cases, and to use this as a template for future integrations.

I'm paging @stichbury as well because I was somewhat careless with the prose in certain parts.

I confess disclose that Lilli wrote the first 2 paragraphs.

Comment on lines +287 to +289
At the moment it's not straightforward to specify the `run_id`
to use a specific version of a model for inference within a Kedro pipeline,
see discussion on [this GitHub issue](https://github.com/Galileo-Galilei/kedro-mlflow/issues/549).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see people doing it in production for a lot of models, so it is definitely possible ;) I really need to see the code you run to get that error, but essentially you cannot save a model in an existing run, because you would alterate it and this is definitely not the purpose of experiment tracking. However, there is absolutely no problem to load a model in an existing mlflow run (say the sapecflights regressor you've just trained) to run a kedro pipeline which will log in another mlflow run, or can just be used without logging in mlflow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

essentially you cannot save a model in an existing run, because you would alterate it and this is definitely not the purpose of experiment tracking

That makes a lot of sense. Still chewing on your explanation, will rewrite this part.

Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Is it possible to keep kedro-mlflow as its own section? The current structure is by feature, "tracking", "artifact". The rest of our docs usually start from "basic" -> "advance".

Comment on lines +75 to +76
"mlflow": ("https://www.mlflow.org/docs/2.12.1/", None),
"kedro-mlflow": ("https://kedro-mlflow.readthedocs.io/en/0.12.2/", None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this config actually do, why do we prefer mapping to a specific version over "stable" or "latest"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because pinning the version means the links will never break ⭐ and it captures what version of MLflow was used to write the docs


- A working Kedro project.
- The MLflow client properly installed in your project: `pip install mlflow`.
- (Recommended) An MLflow Tracking Server, which will allow you to quickly inspect the runs with a user-friendly web interface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention how to start a basic one with mlflow ui etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be short and wouldn't harm I'd say, good suggestion.

@astrojuanlu
Copy link
Member Author

Is it possible to keep kedro-mlflow as its own section? The current structure is by feature, "tracking", "artifact". The rest of our docs usually start from "basic" -> "advance".

I went back and forth several times on how to structure this page. I like how I ended up making 1 section per use case but I agree there might be other ways. Awaiting for @stichbury's take on this.

@stichbury
Copy link
Contributor

I went back and forth several times on how to structure this page. I like how I ended up making 1 section per use case but I agree there might be other ways. Awaiting for @stichbury's take on this.

I found the sectioning useful but agree with @noklam that we usually break into simple and advanced usage. Can I suggest we do the same here and have the structure as follows, but will leave you to decide where the simple/advanced sections fall?

Maybe something like this?

However, I'm not that attached to this and if you want to stick with what you have, I'd say that's fine, but omit the basic second level "Use cases" header and promote all the following (currently 3rd level) to 2nd level.

Header

Prerequisites

Simple use cases

Tracking Kedro pipeline runs in MLflow using Hooks

Artifact tracking in MLflow using hooks

Advanced use cases

Complete tracking of Kedro runs in MLflow using kedro-mlflow

Tracking Kedro in MLflow using the Python API

Artifact tracking in MLflow using kedro-mlflow

Model registry in MLflow using kedro-mlflow

@astrojuanlu
Copy link
Member Author

My biggest gripe with this is that it seems wrong to declare custom hooks as "basic" and using kedro-mlflow, which is objectively fewer lines of code (just a pip install kedro-mlflow away), "advanced". If anything, the former are more "custom", "ad-hoc", or "homegrown", whereas the latter is more "off-the-shelf".

I can totally see how someone starts with the custom hook ("basic"), then they start making it more complex because they need more functionality, and in the end it becomes way more difficult than just pip install kedro-mlflow and let the plugin take care of it for you, assuming the plugin does more or less exactly what you want to do.

@stichbury
Copy link
Contributor

I can't work on your branch so I've forked and made a PR to commit back to it #3862

Please take a look, merge what you want, and I can review again when you have the entire page in your preferred final state (see comment about sectioning above).

* Some proposed edits

Signed-off-by: Jo Stichbury <[email protected]>

* Fix some Vale warnings

Signed-off-by: Jo Stichbury <[email protected]>

---------

Signed-off-by: Jo Stichbury <[email protected]>
@astrojuanlu
Copy link
Member Author

I'd say that's fine, but omit the basic second level "Use cases" header and promote all the following (currently 3rd level) to 2nd level.

We agreed to do this 👍🏼 Will make the change today

Copy link
Contributor

@DimedS DimedS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @astrojuanlu , for the excellent manual. Everything is working well. It's a great starting point for exploring Kedro+MLFlow. I've left a few minor comments.

We recommend use of an [MLflow tracking server](https://mlflow.org/docs/latest/tracking/server.html), that enables you to inspect runs through a web interface. This is optional, but strongly recommended, even if you're working locally on a solo project. This documentation assumes that you have a tracking server, whether it's a local one run by yourself or a remote one running on another machine or provided by a cloud vendor.
You will need to configure your client properly,
either by setting the `MLFLOW_TRACKING_URI` environment variable
or by using the {external+mlflow:py:func}`mlflow.set_tracking_uri` function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t quite understand the idea of that section. I simply run mlflow ui and everything works well without any configuration. Perhaps we should include this command in the manual for simplicity, and provide links for those interested in more detailed information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is very confusing for users. Actually it is not mandatory to have a tracking server to be able to use the UI, and I think the entire documentation examples should work without one. It is obviously very recommended to have one for many reasons (persistence, collaboration, granular access, separate storage of heavy components from metadata...), but it's like saying "don't use local csv files to store your data but use a database instead" : beginners won't see the immediate need and it's not a big deal for the first use I guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're both completely right, this stems from some misunderstanding I had (I confess my first experience with MLflow was kinda bumpy). I will simplify all this, agreed with @Galileo-Galilei that it's not really needed to introduce this complexity early on.

import mlflow
from kedro.framework.hooks import hook_impl
from mlflow.entities import RunStatus

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the following lines to make it work:

import logging
logger = logging.getLogger(__name__)

Perhaps it would be sensible to include these in the manual?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, will add


One possible way of doing it is using the {py:meth}`~kedro.framework.hooks.specs.PipelineSpecs.before_pipeline_run` Hook
to log the `run_params` passed to the Hook.
An implementation would look as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, it would be beneficial for me to receive a brief explanation of what I should do next. Including a short description that instructs me to create a new file and register it according to the guidelines in this manual Kedro Hooks Introduction would be helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to avoid adding too much prose here but at least a link to the Kedro Hooks intro would indeed be useful here, will add.

We recommend use of an [MLflow tracking server](https://mlflow.org/docs/latest/tracking/server.html), that enables you to inspect runs through a web interface. This is optional, but strongly recommended, even if you're working locally on a solo project. This documentation assumes that you have a tracking server, whether it's a local one run by yourself or a remote one running on another machine or provided by a cloud vendor.
You will need to configure your client properly,
either by setting the `MLFLOW_TRACKING_URI` environment variable
or by using the {external+mlflow:py:func}`mlflow.set_tracking_uri` function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is very confusing for users. Actually it is not mandatory to have a tracking server to be able to use the UI, and I think the entire documentation examples should work without one. It is obviously very recommended to have one for many reasons (persistence, collaboration, granular access, separate storage of heavy components from metadata...), but it's like saying "don't use local csv files to store your data but use a database instead" : beginners won't see the immediate need and it's not a big deal for the first use I guess.

Comment on lines +104 to +107
:::{note}
If you implemented your custom `MLflowHooks` as described in the previous section,
consider disabling them to avoid logging the runs twice.
:::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is confusing for beginners and the warning should be at the beginning of the section. If one's read the section from linearly top to bottom (which is what you usually do 😄) I think it is very confusing to understand that this section suggests to replace the custom hook with kedro-mlflow and not add it. Eventually as a beginner I would likely wonder "why the hell did I start to do something that is suggested t be removed in the following section?

I think an (advanced) example that would be much nicer could be an example of a combination of kedro-mlflow and a custom hook (which will for exemple track something custom that kedro-mlflow does not, e.g. the commit sha ?), but still benefit from kedro-mlflow configuration management because kedro-mlflow hook will run first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very valid point and goes back to something @stichbury had mentioned to me as well about mixing "explanation" with "how to" content in here.

The first hook is supposed to have pedagogical value but you're right it kind of breaks the flow.

I'll see if I can rework this to keep that first part as an explainer, or if I'll have to drop it completely and do as you say @Galileo-Galilei and evolve from kedro-mlflow with custom hooks that are used in combination with the plugin.

Comment on lines +121 to +123
Notice that `kedro-mlflow` assumes 1 Kedro project = 1 MLflow experiment.
If this does not suit your use case, consider pursuing a more customised approach
like the one described in the previous section.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is the default but it is fully customisable in the mlflow.yml, so I won't call it an assumption (you can even use a custom resolver to create something very custom and dynamic in the mlflow.yml if you want)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very true. I'll reword this section.

Comment on lines +193 to +194
Notice though that storing artifacts as Python pickles
prevents you from fully using MLflow native preview capabilities.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to specify it? Just a nitpick, but maybe it would be nicer to write it the other way around with a csv or a png example which would render with a nice preview and say that mlflow offers a preview for some data format?

Comment on lines +287 to +289
At the moment it's not straightforward to specify the `run_id`
to use a specific version of a model for inference within a Kedro pipeline,
see discussion on [this GitHub issue](https://github.com/Galileo-Galilei/kedro-mlflow/issues/549).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see people doing it in production for a lot of models, so it is definitely possible ;) I really need to see the code you run to get that error, but essentially you cannot save a model in an existing run, because you would alterate it and this is definitely not the purpose of experiment tracking. However, there is absolutely no problem to load a model in an existing mlflow run (say the sapecflights regressor you've just trained) to run a kedro pipeline which will log in another mlflow run, or can just be used without logging in mlflow.

@astrojuanlu
Copy link
Member Author

Thanks for the review @Galileo-Galilei 🙏🏼 will address your comments ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants