Support MLX on Kubernetes with Kubeflow #2047

andreyvelich · 2024-04-10T00:09:57Z

MLX is a new ML framework specifically designed to run on Apple silicon: https://github.com/ml-explore/mlx

It has some differences compare to PyTorch with mps backend: ml-explore/mlx#12 (comment)

It would be nice to integrate MLX in Kubeflow ecosystem for distributed capabilities, and provide a way to run MLX models on Kubernetes.

For example, we can leverage Kubeflow Training Operator for MLX Model Training and Fine-Tuning, and Kubeflow Katib for HyperParameter optimization.
Since Kind cluster supports ARM arch, we should explore if we can use M-series GPUs for MLX model training with Kind in the future.

In addition to that, I saw examples how folks run Kubernetes on multi-VMs with MacOS machines and kubeadm.
That might be useful when a single machine can't handle very large ML model.

cc @kubeflow/wg-training-leads @awni

The text was updated successfully, but these errors were encountered:

gaocegege · 2024-04-10T04:27:39Z

In addition to that, I saw examples how folks run Kubernetes on multi-VMs with MacOS machines and kubeadm.
That might be useful when a single machine can't handle very large ML model.

Interesting. Does MLX support multi-node training?

awni · 2024-04-10T04:30:18Z

Not yet. We are working on it. Probably makes sense to follow up on this once we have some basic support there.

andreyvelich added the kind/feature label Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support MLX on Kubernetes with Kubeflow #2047

Support MLX on Kubernetes with Kubeflow #2047

andreyvelich commented Apr 10, 2024

gaocegege commented Apr 10, 2024

awni commented Apr 10, 2024

Support MLX on Kubernetes with Kubeflow #2047

Support MLX on Kubernetes with Kubeflow #2047

Comments

andreyvelich commented Apr 10, 2024

gaocegege commented Apr 10, 2024

awni commented Apr 10, 2024