Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent/skypilot #2407

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from
Draft

Agent/skypilot #2407

wants to merge 16 commits into from

Conversation

novahow
Copy link
Contributor

@novahow novahow commented May 10, 2024

Tracking issue

flyteorg/flyte#3936

Why are the changes needed?

Skypilot agent

What changes were proposed in this pull request?

Please refer to the diagram

image

How was this patch tested?

Setup process

sky_test.py

from flytekit import task, workflow, Secret
from flytekitplugins.skypilot import SkyPilot, SkyPilotFunctionTask
# import sky
from flytekit.configuration import Config, SecretsConfig, SerializationSettings
import flytekit

IMTERNAL_IMAGE = "flytesky/plugins:skypilot"  # "cr.flyte.org/flyteorg/flytekit:py3.10-latest"

@task(
    task_config=SkyPilot(
        cluster_name="t3",
        # prompt_cloud=True,
        resource_config=[
            {
                "instance_type": "t2.micro"
                # "cloud": "kubernetes" 
            }
        ],
    ),
    container_image=IMTERNAL_IMAGE,
)
def t3(a: int) -> str:
    return str(a + 3)

@workflow
def wf(a: int = 3):
    t1()
    res = t2(a=a)
    print(res)
    

if __name__ == "__main__":
    wf()

pyflyte --verbose run --remote sky_test.py t3 --a "3"

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

@novahow
Copy link
Contributor Author

novahow commented May 10, 2024

encountered StopIteration bug when using remote s3, still investigating

image

@novahow
Copy link
Contributor Author

novahow commented May 25, 2024

IMTERNAL_IMAGE = "flytesky/plugins:skypilot"
@task(
    task_config=SkyPilot(
        cluster_name="t2",
        # prompt_cloud=True,
        resource_config={
            "instance_type": "e2-small",
            "use_spot": True,
        },
        container_run_type=1,
        job_launch_type=1,
        # stop_after=3
    ),
    container_image=IMTERNAL_IMAGE,
)
def t3(a: int) -> str:
    return str(a + 3)


@task(
    task_config=SkyPilot(
        cluster_name="t4",
        resource_config={
            "ordered": [
                {
                    "cloud": "gcp",
                    "accelerators": "T4:1",
                    "instance_type": "n1-standard-2"
                },
                {
                    "cloud": "gcp",
                    "accelerators": "P4:1"
                }
            ]
        },
        
        container_run_type=0,
        setup="python -m pip install torch",
    ),
    container_image=IMTERNAL_IMAGE
)
def cuda_task() -> str:
    import torch
    return f"cuda on: {torch.cuda.is_available()}"

@workflow
def ml_wf():
    res = cuda_task()
    print(res)
container_run_type: 
- 0: use image as vm image or pull in vm 
- 1: use docker run
 Currently `1` is more recommended as skypilot alters your original image. 
job_launch_type: 
- 0: launch a cluster
- 1: launch a controller cluster to manage your job
FROM localhost:30000/flytekit:latest as base
ARG PYTHON_VERSION

MAINTAINER Flyte Team <[email protected]>
LABEL org.opencontainers.image.source=https://github.com/flyteorg/flytekit

WORKDIR /root
ENV FLYTE_SDK_RICH_TRACEBACKS 0

# Flytekit version of flytekit to be installed in the image
ARG PSEUDO_VERSION
RUN SETUPTOOLS_SCM_PRETEND_VERSION_FOR_FLYTEKIT=$PSEUDO_VERSION pip install --no-cache-dir -U \
        skypilot

USER root
RUN apt-get update && apt-get install sudo socat locales -y
RUN sudo locale-gen en_US.UTF-8
RUN deluser --remove-home flytekit
RUN useradd -u 1000 -m -d /home/flytekit flytekit
USER flytekit
# Note: Pod tasks should be exposed in the default image
# Note: Some packages will create config files under /home by default, so we need to make sure it's writable
# Note: There are use cases that require reading and writing files under /tmp, so we need to change its permissions.

# Run a series of commands to set up the environment:
# 1. Update and install dependencies.
# 2. Install Flytekit and its plugins.
# 3. Clean up the apt cache to reduce image size. Reference: https://gist.github.com/marvell/7c812736565928e602c4
# 4. Create a non-root user 'flytekit' and set appropriate permissions for directories.

FROM base as dev

COPY . /flytekit

RUN SETUPTOOLS_SCM_PRETEND_VERSION_FOR_FLYTEKIT=$PSEUDO_VERSION pip install --no-cache-dir -U \
        -e /flytekit \
        -e /flytekit/plugins/flytekit-skypilot


USER root
ENV PYTHONPATH "/flytekit:/flytekit/plugins/flytekit-k8s-pod:/flytekit/plugins/flytekit-deck-standard:"
# ENV FLYTE_AWS_ENDPOINT "http://localhost:30080/"
# ENV FLYTE_AWS_ACCESS_KEY_ID "minio"                           
# ENV FLYTE_AWS_SECRET_ACCESS_KEY "miniostorage"
RUN echo "flytekit ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
# Switch to the 'flytekit' user for better security
SHELL ["/bin/bash", "-c"]
# RUN echo "SHELL=/bin/bash" >> /etc/profile
# RUN rm /bin/sh && ln -s /bin/bash /bin/sh
USER flytekit
# ENTRYPOINT ["/bin/bash"]
# CMD ["/bin/bash"]

Secrets

AWS

aws-configure/aws_access_key_id
aws-configure/aws_secret_access_key

GCP (service_account)

gcloud/client_email
gcloud/private_key
gcloud/project_id

@novahow novahow closed this May 25, 2024
@novahow novahow reopened this May 25, 2024
Signed-off-by: novahow <[email protected]>
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Sky support Python 3.11 and 3.12?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They said the nightly version of sky supports 3.11. stable version which supports 3.11 will be released soon

Copy link

codecov bot commented May 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.61%. Comparing base (bf38b8e) to head (303219d).
Report is 121 commits behind head on master.

Current head 303219d differs from pull request most recent head 9bdddb7

Please upload reports for the commit 9bdddb7 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2407      +/-   ##
==========================================
+ Coverage   83.04%   86.61%   +3.57%     
==========================================
  Files         324        3     -321     
  Lines       24861      142   -24719     
  Branches     3547        0    -3547     
==========================================
- Hits        20645      123   -20522     
+ Misses       3591       19    -3572     
+ Partials      625        0     -625     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants