Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TRACKING] discussion & planning for future of kubeflow/kubeflow repo #7549

Open
thesuperzapper opened this issue Apr 12, 2024 · 36 comments
Open

Comments

@thesuperzapper
Copy link
Member

thesuperzapper commented Apr 12, 2024

As part of kubeflow/internal-acls#618 (giving @kimwnasptd and @thesuperzapper write access to the kubeflow/kubeflow repo), we were asked by the @kubeflow/kubeflow-steering-committee to make a plan for the future of the kubeflow/kubeflow repo.

Background

All components that currently live in kubeflow/kubeflow (under the ./components/ folder) are owned and maintained by @kubeflow/wg-notebooks-leads.

People have identified a few issues with this:

  1. I can make development harder:
    • We want to release our components more frequently than Kubeflow itself (which may create confusion for users, if they see a 2.X.X release on the kubeflow/kubeflow repo)
    • Until recently, we didn't even have write access to the repo (e.g. could not create new branches, tags, or cherry-pick commits)
  2. New users are sometimes confused about where to find source code

Options

There is not a clear "best option" for the future of the kubeflow/kubeflow repo, but here are the 3 ones I can see.

NOTE: any plan that involves moving code will break ALL pending PRs on that code, in addition to many historical links (unless we include README files to say the code is moved).

Option 1: do nothing

We could just leave everything as is.

The existing code has lived for so long in its current location, and we can address most concerns with better documentation.

Option 2: move non-core components

REMOVED

Option 3: move everything ⭐

I think there are 2 isolatable sections of Kubeflow that live in kubeflow/kubeflow right now:

  1. Kubeflow "Dashboard" (Dashboard, Profiles, Auth)
  2. Kubeflow "Notebooks" (Notebooks 1.0, Notebooks 2.0, Tensorboards, Volumes)

That would leave us with the following:

@andreyvelich
Copy link
Member

Thank you for opening this @thesuperzapper!
Please can you also add the suggestion that @kimwnasptd proposed here: kubeflow/internal-acls#618 (comment)

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Apr 15, 2024

I would like to add @andreyvelich option mentioned here kubeflow/internal-acls#618 (comment)

My personal opinion based on andreys proposal is:

The more repositories we have, the more cumbersome development becomes. There is also the requirement from the KSC to split kubeflow/kubeflow. kubeflow/control-plane + kubeflow/workspaces and kubeflow/manifests adds too much development and synchronization overhead in my opinion. Moving multi-user/multi-tenancy stuff into Kubeflow/manifests where already most of the multi tenancy stuff lives makes more sense to me. Splitting what we have to release together anyway over multiple repositories does not make that much sense to me. Kfam, profile controller, etc. are tightly coupled with kubeflow/manifests. I am also fine with renaming kubeflow/manifests to kubeflow/control-plane or kubeflow/platform. But we need as few repositories as possible and a common place for multi-user stuff. For example notebooks, PVC-viewer and maybe some other things could stay in kubeflow/workspaces.

@kimwnasptd
Copy link
Member

kimwnasptd commented Apr 15, 2024

I like @juliusvonkohout's arguments. My prosal would as well be on moving multi-tenancy closer to manifests and the rest on kubeflow/workspaces (for now). And as next steps afterwards to define what to do with those extra components (volume mgmt, tensorboard, poddefaults) and move them potentially away from workspaces.

@thesuperzapper has a good point that right now manifests repo is laser focused only on providing a catalogue of manifests, and I agree it should stick to this. It will be confusing from a user point of view to suddenly see multi-tenancy code into a manifests repository.

So my proposal is the following, after taking into consideration @juliusvonkohout @thesuperzapper and @andreyvelich's points:

  1. Change scope of kubeflow/manifests for handling multi-tenancy
  2. Have a new repo kubeflow/multi-tenancy that will be a subproject of manifests
  3. The kubeflow/multi-tenancy repo will contain
    1. Profiles and KFAM
    2. Central Dashboard (interacts with both KFAM and Profiles)
    3. Istio manifests
    4. OIDC manifests (oauth2-proxy and Dex)
  4. Have a kubeflow/workspaces repo which will contain the rest from kubeflow/kubeflow
    1. Notebooks controller and web app
    2. Example notebook servers
    3. Volumes web app and pveviewers controller
    4. Tensorboards web app and controller
    5. PodDefaults

Note that for multi-tenancy, I explicitly didn't mention new WGs. Although I believe this makes sense down the road, but we can start with this being a supbroject of manifests since all the dicsussions are happening there already with @juliusvonkohout and I.

@kimwnasptd
Copy link
Member

The above is to immediately unblock the effort of cleaning up kubeflow/kubeflow for now. Then the next goal should be to help on the scope of what the Notebooks WG owns, and what is included in the kubeflow/workspaces repo.

Specifically I would suggest we think about and have answers with the @kubeflow/kubeflow-steering-committee on the following:

  1. What do we do with components that aim to make it more streamlined to interact with K8s? (poddefaults, volumes web app for managing pvcs, pvcviewer controller)
  2. TensorBoard is an ML tool, but doesn't really fit to live under kubeflow/workspaces. Let's either
    1. Deprecate the controller and web app if it's not used a lot (should we have a survey? cc @StefanoFioravanzo)
    2. Have a new repo, in the future, for this and potentially other Data Visualisation tools

IMO with answering the above will also help the kubeflow/workspaces repo to be more clean by focusing it more on notebooks/workspaces and not on other K8s functionalities or data visualisation tools.

@andreyvelich
Copy link
Member

cc @kubeflow/wg-pipeline-leads @kubeflow/wg-training-leads @kubeflow/wg-deployment-leads @kubeflow/wg-model-registry-leads @kubeflow/kubeflow-steering-committee for the feedback.

@StefanoFioravanzo
Copy link
Member

I agree with @kimwnasptd proposal.

@kimwnasptd just to clarify: you are suggesting that the kubeflow/multi-tenancy repo would be under the Manifests WG responsibility, correct? We would probably need a "Control Plane WG" that is both responsible for kubeflow/manifests and kubeflow/multi-tenancy.

My 2 cents on:

What do we do with components that aim to make it more streamlined to interact with K8s?

These would all make sense as standalone projects, but they cannot stay outside of one of the existing working groups just yet. I am wondering why @juliusvonkohout suggests that we should have as few repos as possible. What is stopping us from having kubeflow/poddefaults and kubeflow/volume-management repos? These could still be Notebooks WG subprojects, but at least have a separate lifecycle that could promote more contributions.

TensorBoard is an ML tool, but doesn't really fit to live under kubeflow/workspaces.

Good observation. We don't have data points as to how popular this is. I don't think that this component should stay under the Notebooks WG. I think we should:

  1. Agree that indeed TF Controller cannot live under kubeflow/workspaces
  2. Decide if we are okay with having a kubeflow/tensorboard repo
  3. If NO (due to operational difficulties with having a separate repo, for whatever reason), then Deprecation is the only option
  4. If YES, then do a call to action with a deadline. If someone is willing to maintain this repo, then we can figure out how to do so.

@juliusvonkohout
Copy link
Member

These would all make sense as standalone projects, but they cannot stay outside of one of the existing working groups just yet. I am wondering why @juliusvonkohout suggests that we should have as few repos as possible. What is stopping us from having kubeflow/poddefaults and kubeflow/volume-management repos? These could still be Notebooks WG subprojects, but at least have a separate lifecycle that could promote more contributions.

It adds so much overhead as Software developer, maintainer and reviewer. Just try it out yourself :-D. You will usually get lost in Processes and talking with less code and way too much communication and synchronization overhead

@thesuperzapper
Copy link
Member Author

thesuperzapper commented Apr 15, 2024

Based on some discussions today I have updated my proposed "Option 3" above to suggest splitting the repo up into kubeflow/dashboard (could also be called kubeflow/platform), and kubeflow/workspaces.

The goal would be to build "Notebooks 2.0" in a separate branch of kubeflow/workspaces and eventually have it replace the need for the volumes and tensorboard controllers.

@rimolive
Copy link
Member

rimolive commented Apr 15, 2024

@kimwnasptd just to clarify: you are suggesting that the kubeflow/multi-tenancy repo would be under the Manifests WG responsibility, correct? We would probably need a "Control Plane WG" that is both responsible for kubeflow/manifests and kubeflow/multi-tenancy.

This is a great idea, it can be WG Manifests responsibility as they are already working on both.

It adds so much overhead as Software developer, maintainer and reviewer. Just try it out yourself :-D. You will usually get lost in Processes and talking with less code and way too much communication and synchronization overhead

Maybe the overhead comes from the fact that there are too much manual steps to accomplish this? If so, we can figure out a way to automate it and trigger only when we need to cut releases for Kubeflow.

@StefanoFioravanzo
Copy link
Member

Maybe the overhead comes from the fact that there are too much manual steps to accomplish this? If so, we can figure out a way to automate it and trigger only when we need to cut releases for Kubeflow.

Exactly! This seems something that should not block the creation of new repos, but rather encourage us to find ways to remove barriers and simplify process

@StefanoFioravanzo
Copy link
Member

The goal would be to build "Notebooks 2.0" in a separate branch of kubeflow/workspaces and eventually have it replace the need for the volumes and tensorboard controllers

@thesuperzapper what do you mean with that?

@andreyvelich
Copy link
Member

andreyvelich commented Apr 16, 2024

In addition to the @thesuperzapper comment above: #7549 (comment) I would like to add the following ideas based on our recent discussion.

I propose the idea that we should create GitHub repos for Kubeflow components only when it makes sense to call the tool as an individual sub-project and can be deployed as a standalone application.
For example: Kubeflow Notebooks, Kubeflow Pipelines, Kubeflow Katib, Kubeflow Model Registry, Kubeflow Spark Operator, and Kubeflow Training Operator.
Usually, those components can have their own release schedule.

Thus, from my perspective to find place for the "common" components (e.g. profile controller, central dashboard, TensorBoard, PVC Viewer) we should define a new entity called Kubeflow Platform which provides a way to deploy all things together and it requires those "common" components.
Until we will identify clear user requirements when those components can be used as a stand-alone application, I am not sure if we need to separate them.

That should help us to explain clearer how Kubeflow can be used:

  1. Install Kubeflow Platform from manifests.
  2. Install Kubeflow Platform from package distribution.
  3. Install Kubeflow components standalone.

Option 1: Short-term simple solution

Since we don't need to version these "common" components separately, move them to the kubeflow/manifests and Notebooks components to the kubeflow/workspaces as I mentioned before: kubeflow/internal-acls#618 (comment)

Option 2: Create kubeflow/platform for common components

Move all common components to the kubeflow/platform and Notebooks components to the kubeflow/workspaces.

What do you think about it ?

@StefanoFioravanzo
Copy link
Member

Option 2 seems to be the most future proof and avoids confusion

@rimolive
Copy link
Member

I'll go with Andrey's Option 2 as well. Having KF component code into kubeflow/manifests will not cohesively state the objective of the manifests repo.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Apr 17, 2024

As far as i understand @andreyvelich the second option just implies renaming kubeflow/manifests to kubeflow/platform, but still having the same content as in Option one.

I am in favor of option one with the renaming to Kubeflow/platform.

Because i agree on "I propose the idea that we should create GitHub repos for Kubeflow components only when it makes sense to call the tool as an individual sub-project and can be deployed as a standalone application.
For example: Kubeflow Notebooks, Kubeflow Pipelines, Kubeflow Katib, Kubeflow Model Registry, Kubeflow Spark Operator, and Kubeflow Training Operator.
Usually, those components can have their own release schedule."

@andreyvelich
Copy link
Member

the second option just implies renaming kubeflow/manifests to kubeflow/platform, but still having the same content as in Option one.

From my point of view this is the right approach since manifests repo is the superset of all Kubeflow components to deploy Kubeflow Platform product. Also, these components are versioned the same as manifests.

@thesuperzapper
Copy link
Member Author

I strongly believe that the manifests repo should ONLY aggregate the manifests (which are authored in the other repos).

There is no benefit to bringing code into the manifests aggregation repo. We would only create problems that make it harder to develop both the code and aggregate the manifests in a usable way.


The three "components" (isolatable sections of Kubeflow) that live in kubeflow/kubeflow right now are:

  1. Kubeflow "Dashboard":
    • the central dashboard itself
    • profile controller
    • KFAM
    • (and probably the manifests to deploy Istio, dex, and oauth2-proxy)
  2. Kubeflow "Workspaces":
    • Kubeflow Notebooks (controller + UI + example images)
    • PVC Management (controller + UI)
    • Tensorboards (controller + UI)
  3. Kubeflow "Admission" (PodDefaults):
    • This one should either be part of dashboard or have its own repo.
    • It is used frequently in KFP, in addition to Notebooks, so should be able to be deployed separately.

Given that, I think we should create 2-3 new repos for these components, so they can be versioned on their own lifecycle:

  • kubeflow/dashboard
  • kubeflow/workspaces
  • kubeflow/admission (or this can live under the dashboard repo, to reduce the number of repos)

@andreyvelich
Copy link
Member

andreyvelich commented Apr 18, 2024

I strongly believe that the manifests repo should ONLY aggregate the manifests (which are authored in the other repos). There is no benefit to bringing code into the manifests aggregation repo. We would only create problems that make it harder to develop both the code and aggregate the manifests in a usable way.

Please can you explain what kind of problems we are going to have if we will combine platform components and manifests in a single repo ?

From my point of view the benefit of combining manifests and platform components to a single repo is to simplify process of making releases and triaging issues from our end users. E.g. less GitHub repos we are going to have, less PRs and Issues will stay abandoned.
Also, manifests repo is only required for Kubeflow Platform, so combining them in a single repo make sense to me.

Kubeflow "Workspaces":

@thesuperzapper Why do you want to include PVC Management and Tensorboards to the kubeflow/workspaces ?

@james-jwu
Copy link
Contributor

Throwing in my 2c here. I think we are trying to solve a number of issues: engineering velocity, customer perception & experience, architectural "cleanliness", and ensuring maintainability of repos. IMO not all of these issues can be solved by repo organization, but maybe we can focused on the most important issues, and find compromise on less important issues.

To the Andrey's point about "we should create GitHub repos for Kubeflow components only when it makes sense to call the tool as an individual sub-project [1] and can be deployed as a standalone application.[2]", I agree with the part [1]. On part [2] I think there is the chance of the "platform" growing too big, so it is not necessary to have a single "platform" repo. Rather, we could create repos for large feature areas. Today the only user facing feature in "platform" is central dashboard, so we could create a repo named as such. This will reduce confusion on the customers when they try to file issues, etc.

For the common components (profile controller, etc.), there are several options: leave them with central dashboard, create separate repo, move some/all to manifest repo. I think we can give the current maintainers of these components and manifest repo to decide. It may not results in the cleanest option from architectural perspective, but I want to optimize the developer/maintainer's workflow because it's not like we have a large group of developers behind each component.

Another point I want to make is whether we should separate workspace and notebook repos. IMO it may be a better product strategy to give customer confidence about product continuity from Notebook to its next version. So maybe having workspace developed in notebooks repo is a better choice from this perspective.

@thesuperzapper
Copy link
Member Author

@james-jwu I 100% agree that the repos should be named by their "user-facing purpose".

That's why my initial proposal was to name the repo which contains dashboard/profile-controller/kfam as kubeflow/dashboard. However, to make the name more flexible (e.g. allow admission-webhook to be included), and open the possibility of "branding" the frontend of Kubeflow as "Kubeflow Central", I quite like the repo name kubeflow/central.

In any case, it's not possible to use the dashboard without profile-controller or kfam so I plan to keep them in the same repo as each other.


While we could develop notebooks/workspaces in the same repo, because we want to version/release them separately (and allow them to be deployed alongside each other), I think separate repos is cleaner.

My initial thought was to use separate branches of kubeflow/workspaces, but that just leads to more confusion, as branches aren't very visible, and makes it harder to "archive" Notebooks 1.0 once 2.0 is mature.


In any case, it's clear the next steps are to:

  1. Create a new kubeflow/workspaces repo (so we can start scaffolding the Notebooks 2.0 code ASAP)
  2. Create a new kubeflow/notebooks repo (so we can migrate the Notebooks 1.0 components to it)
  3. Continue to discuss the future location for dashboard/profile/kfam/poddefaults (and leave them in kubeflow/kubeflow until we have a decision).

If @kimwnasptd agrees on steps 1 and 2, what is the process to create those new repos @james-jwu?

@andreyvelich
Copy link
Member

That's why my initial proposal was to name the repo which contains dashboard/profile-controller/kfam as kubeflow/dashboard. However, to make the name more flexible (e.g. allow admission-webhook to be included), and open the possibility of "branding" the frontend of Kubeflow as "Kubeflow Central", I quite like the repo name kubeflow/central.

As @juliusvonkohout said before: #7549 (comment) most of the multi-tenancy stuff is already living in kubeflow/manifests that is why we can move profile-controller, kfam, admission-webhook to the kubeflow/manifests.

I think, we should gather feedback from folks in the community who is planning to maintain these components (e.g. @kubeflow/wg-manifests-leads @kubeflow/wg-training-leads).
Can we find folks in the community who can maintain these components ?

Also, what do you think about this question: "What repo user might use if they get issues with Profile Controller ?"

My initial thought was to use separate branches of kubeflow/workspaces, but that just leads to more confusion, as branches aren't very visible, and makes it harder to "archive" Notebooks 1.0 once 2.0 is mature.

Why you can't use the same branches for Notebooks 1.0 and 2.0 in kubeflow/workspaces ?

For example, when @kimwnasptd and team worked on the new Katib UI: https://github.com/kubeflow/katib/projects/1, we created a new directory new-ui and committed code there. For a while we were releasing 2 Katib UI images for old and new UI. After a while we deprecated the old Katib UI: kubeflow/katib#2179.

@StefanoFioravanzo
Copy link
Member

I agree with @james-jwu on keeping workspaces in the same repo as notebooks. It's much better to provide continuity by keeping both workstreams in the same repo, even if versioning/releasing may be slightly more challenging at the beginning. It makes much more sense from a product perspective and give much more clarity to users and contributors alike

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Apr 23, 2024

@andreyvelich since you asked me directly in the community call: Yes i am willing to maintain the components in the kubeflow/platform kubeflow/workspaces split and also in Matthew Wicks kubeflow/dashboard kubeflow/workspaces
kubeflow/admission kubeflow/platform split, as well as something in between like kubeflow/manifests kubeflow/workspaces kubeflow/multi-tenancy.

@thesuperzapper
Copy link
Member Author

Hey @andreyvelich @james-jwu @kimwnasptd, we discussed this in the Notebooks WG meeting today, and we are happy to keep notebooks and workspaces in the same kubeflow/notebooks repo.

Therefore, since we are all in agreement that at least kubeflow/notebooks needs to exist, can we please start the process to create that repository?

@rimolive
Copy link
Member

rimolive commented Apr 25, 2024

We need to work on a 1.9.0-rc0 release by the week of April 29th, so when working with separating notebooks from the main repo make sure this doesn't affect or block cutting a release for notebooks for 1.9.0-rc0.

@andreyvelich
Copy link
Member

Hey @andreyvelich @james-jwu @kimwnasptd, we discussed this in the Notebooks WG meeting today, and we are happy to keep notebooks and workspaces in the same kubeflow/notebooks repo.

Therefore, since we are all in agreement that at least kubeflow/notebooks needs to exist, can we please start the process to create that repository?

Sure, we can start that. Let's discuss this tomorrow @kubeflow/kubeflow-steering-committee.

@kimwnasptd @thesuperzapper Do you want to use kubeflow/notebooks repo in the future to develop Notebooks 2.0, a.k.a Kubeflow Workspaces ?

@thesuperzapper
Copy link
Member Author

Sure, we can start that. Let's discuss this tomorrow @kubeflow/kubeflow-steering-committee.

@kimwnasptd @thesuperzapper Do you want to use kubeflow/notebooks repo in the future to develop Notebooks 2.0, a.k.a Kubeflow Workspaces ?

Yes, that's was I was meaning by my message in #7549 (comment).

Both the existing notebooks and new workspaces code will be on the same repo (and in the same branch).

@thesuperzapper thesuperzapper pinned this issue Apr 26, 2024
@andreyvelich
Copy link
Member

We discussed this topic today during KSC call and we are happy to create this new repo kubeflow/notebooks to migrate Notebooks-related components.
@kimwnasptd Please can we get confirmation from you as well since you are member of WG Notebooks ?

@kimwnasptd
Copy link
Member

@andreyvelich sounds good!

@andreyvelich
Copy link
Member

@kubeflow/wg-notebooks-leads kubeflow/notebooks repo has been created: https://github.com/kubeflow/notebooks 🎉
Thank you for doing this @zijianjoy!

@thesuperzapper
Copy link
Member Author

@james-jwu @zijianjoy thanks!

Can we please also:

  1. give Github write access to @thesuperzapper and @kimwnasptd on the new repo
    • (so we can approve GitHub actions and cut tags)
    • we will probably need to raise a new pr in kubeflow/internal-acls for this
  2. ensure the branch protection rules are set up like kubeflow/kubeflow:
    • so that the main branch can not be pushed to (so people can't bypass the bot)
      • Screenshot 2024-03-05 at 12 19 14
      • so that DCO is required, to pass under "Require status checks to pass before merging" (NOTE: the option for this won't come up until we raise a PR the first time).
  3. Add kubeflow/notebooks to the following config (so that people's PRs are not automatically self-approved):

@thesuperzapper
Copy link
Member Author

I have raised a separate PR to give @kimwnasptd and @thesuperzapper write access to the new kubeflow/notebooks repo:

@juliusvonkohout
Copy link
Member

@andreyvelich what are the next steps planned then?

@andreyvelich
Copy link
Member

@james-jwu @zijianjoy Please can you let us know if you made changes according to the @thesuperzapper comment: #7549 (comment)

@andreyvelich what are the next steps planned then?

The next steps are:

  1. @kubeflow/wg-notebooks-leads should transfer Kubeflow Notebooks code to the kubeflow/notebooks

  2. Transfer Notebooks PRs and Issues to the new repo from kubeflow/kubeflow.

  3. We need to identify WG who can take responsibility to maintain Kubeflow Platform control-plane components:

Maybe we can spend a few minutes in the tomorrow's community call cc @jbottum

@thesuperzapper
Copy link
Member Author

@andreyvelich @james-jwu @zijianjoy I have raised a PR in the GoogleCloudPlatform/oss-test-infra repo to require self-approval for root-level approvers.

(So we don't have driveby LGTMs accidentally mering PRs which are not ready).

GoogleCloudPlatform/oss-test-infra#2271

@juliusvonkohout
Copy link
Member

juliusvonkohout commented May 7, 2024

Regarding "Transfer Notebooks PRs and Issues to the new repo from kubeflow/kubeflow." maybe our GSOC Student @hansinikarunarathne can help with that @rimolive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

7 participants