You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi team, the Runhouse docs for on-demand clusters were not super clear about the format of the image_id, but helpfully my initial attempts to bring up a GCP cluster with e.g. image_id="pytorch-cpu-latest" (taken from the GCP docs) raised a clear error e.g. ValueError: Image 'pytorch-latest-cpu' not found in GCP.
I ended up going into the skypilot repo for clarification and found a GCP example in their yaml-spec: projects/deeplearning-platform-release/global/images/family/tf2-ent-2-1-cpu-ubuntu-2004
I modified the above for the image I wanted projects/deeplearning-platform-release/global/images/family/pytorch-1-13-cpu-v20230807-debian-11-py310 and while runhouse allowed me to submit, it hung until it timed out (and I saw no indication in the GCP Console that the instance was coming up).
I tried to run a similar command via sky launch, and saw the error, which I reported to them in this Github Issue. I am raising it here as well in case you want to update your wrapping code to catch this error.
Versions
Please run the following and paste the output below.
Python Platform: Linux-6.4.12-arch1-1-x86_64-with-glibc2.38
Python Version: 3.10.13 (main, Sep 4 2023, 15:52:34) [GCC 13.2.1 20230801]
Relevant packages:
boto3==1.28.40
fastapi==0.103.1
fsspec==2023.5.0
gcsfs==2023.5.0
google-api-python-client==2.97.0
google-cloud-storage==2.10.0
pyarrow==13.0.0
pycryptodome==3.12.0
rich==13.5.2
runhouse==0.0.11
skypilot==0.3.3
sshfs==2023.7.0
sshtunnel==0.4.0
typer==0.9.0
uvicorn==0.23.2
wheel==0.41.2
Checking credentials to enable clouds for SkyPilot.
AWS: disabled
Reason: AWS credentials are not set. Run the following commands:
$ pip install boto3
$ aws configure
$ aws configure list # Ensure that this shows identity is set.
For more info: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html
Details: `aws sts get-caller-identity` failed with error: [botocore.exceptions.NoCredentialsError] Unable to locate credentials.
Azure: disabled
Reason: ~/.azure/msal_token_cache.json does not exist. Run the following commands:
$ az login
$ az account set -s <subscription_id>
For more info: https://docs.microsoft.com/en-us/cli/azure/get-started-with-azure-cli
GCP: enabled
Lambda: disabled
Reason: Failed to access Lambda Cloud with credentials. To configure credentials, go to:
https://cloud.lambdalabs.com/api-keys
to generate API key and add the line
api_key = [YOUR API KEY]
to ~/.lambda_cloud/lambda_keys
IBM: disabled
Reason: Missing credential file at /home/user/.ibm/credentials.yaml.
Store your API key and Resource Group id in ~/.ibm/credentials.yaml in the following format:
iam_api_key: <IAM_API_KEY>
resource_group_id: <RESOURCE_GROUP_ID>
SCP: disabled
Reason: Failed to access SCP with credentials. To configure credentials, see: https://cloud.samsungsds.com/openapiguide
Generate API key and add the following line to ~/.scp/scp_credential:
access_key = [YOUR API ACCESS KEY]
secret_key = [YOUR API SECRET KEY]
project_id = [YOUR PROJECT ID]
OCI: disabled
Reason: `oci` is not installed. Install it with: pip install oci
For more details, refer to: https://skypilot.readthedocs.io/en/latest/getting-started/installation.html#oracle-cloud-infrastructure-oci
Cloudflare (for R2 object store): disabled
Reason: [r2] profile is not set in ~/.cloudflare/r2.credentials. Additionally, Account ID from R2 dashboard is not set. Run the following commands:
$ pip install boto3
$ AWS_SHARED_CREDENTIALS_FILE=~/.cloudflare/r2.credentials aws configure --profile r2
$ mkdir -p ~/.cloudflare
$ echo <YOUR_ACCOUNT_ID_HERE> > ~/.cloudflare/accountid
For more info: https://skypilot.readthedocs.io/en/latest/getting-started/installation.html#cloudflare-r2
SkyPilot will use only the enabled clouds to run tasks. To change this, configure cloud credentials, and run sky check.
If any problems remain, please file an issue at https://github.com/skypilot-org/skypilot/issues/new
Clusters
NAME LAUNCHED RESOURCES STATUS AUTOSTOP COMMAND
Managed spot jobs
No in progress jobs. (See: sky spot -h)
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Hi team, the Runhouse docs for on-demand clusters were not super clear about the format of the
image_id
, but helpfully my initial attempts to bring up a GCP cluster with e.g.image_id="pytorch-cpu-latest"
(taken from the GCP docs) raised a clear error e.g.ValueError: Image 'pytorch-latest-cpu' not found in GCP.
I ended up going into the skypilot repo for clarification and found a GCP example in their yaml-spec:
projects/deeplearning-platform-release/global/images/family/tf2-ent-2-1-cpu-ubuntu-2004
I modified the above for the image I wanted
projects/deeplearning-platform-release/global/images/family/pytorch-1-13-cpu-v20230807-debian-11-py310
and while runhouse allowed me to submit, it hung until it timed out (and I saw no indication in the GCP Console that the instance was coming up).I tried to run a similar command via
sky launch
, and saw the error, which I reported to them in this Github Issue. I am raising it here as well in case you want to update your wrapping code to catch this error.Versions
Please run the following and paste the output below.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: