Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to create GCP Cpu Instances #106

Open
khumairraj opened this issue Sep 1, 2021 · 7 comments
Open

Unable to create GCP Cpu Instances #106

khumairraj opened this issue Sep 1, 2021 · 7 comments
Labels

Comments

@khumairraj
Copy link

Hi there, Thank you for the amazing tool.

I have been trying to use spotty to make a CPU instance on GCP.
Below is the spotty.yaml file which I am using.

project:
  name: spotty-heareval
  syncFilters:
    - exclude:
        - .idea/*
        - .git/*
        - '*/__pycache__/*'
        - _workdir/*
        - tasks/*
        - embeddings/*
        - .mypy_cache/*
        - lightning_logs/*
        - heareval.egg-info/*
        - pretrained/*
        - wandb/*

containers:
  - projectDir: /workspace/project
    image: alpine
    volumeMounts:
      - name: workspace
        mountPath: /workspace
    runtimeParameters: ['--shm-size', '20G']

instances:
  - name: spotty-heareval-dp-khumairraj
    provider: gcp
    parameters:
      zone: europe-west4-a
      machineType: n1-standard-1
      imageUri: projects/ml-images/global/images/c0-deeplearning-common-cpu-v20210818-debian-10
      volumes:
        - name: workspace
          parameters:
            size: 250
            mountDir: /workspace

The error that comes up is:

Waiting for the stack to be created...
  - launching the instance...
  - running the Docker container...
  Error:
  ------
  Deployment "spotty-instance-spotty-heareval-spotty-heareval-dp-khumairraj" failed.
  Error: {"ResourceType":"runtimeconfig.v1beta1.waiter","ResourceErrorCode":"412","ResourceErrorMessage":"Failure condition satisfied."}

Please let me know if I am missing something in the configuration, or a known solution.
Thanks!

@turian
Copy link

turian commented Sep 2, 2021

@apls777 I am having the same issue with this spotty.yaml:

# You must delete disks manually on GCP :\
# https://console.cloud.google.com/compute/disks?project=hear2021-evaluation

project:
  name: hearpreprocess
  syncFilters:
    - exclude:
        - .idea/*
        - .git/*
        - '*/__pycache__/*'
        - _workdir/*
        - tasks/*
        - .mypy_cache/*
        - hearpreprocess.egg-info/*

containers:
  - projectDir: /workspace/project
    image: turian/hearpreprocess
    volumeMounts:
      - name: workspace
        mountPath: /workspace
    runtimeParameters: ['--shm-size', '20G']

instances:
  - name: hearpreprocess-i1-joseph
    provider: gcp
    parameters:
      zone: europe-west4-a
      machineType: c2-standard-16
      preemptibleInstance: False
      # gcloud compute images list
      # https://console.cloud.google.com/compute/images?project=hear2021-evaluation
      imageUri: projects/ubuntu-os-cloud/global/images/ubuntu-2004-focal-v20210825
      volumes:
        - name: workspace
          parameters:
            # Be careful to delete this if you're not using it!
            size: 2000
# Not implemented for GCP, all volumes will be retained
#            deletionPolicy: retain
            mountDir: /workspace

scripts:
  clean: |
    bash clean.sh

@turian
Copy link

turian commented Sep 6, 2021

@apls777 let me know if you have any ideas about this! Thank you

@apls777
Copy link
Collaborator

apls777 commented Sep 28, 2021

@khumairraj @turian Sorry for the delay in getting back to you. The issue in both your configs is in the mountDir: /workspace parameter. Just remove it and it will work.

Usually, there is no need to specify the instances[]...volumes[]...mountDir parameter. This parameter customizes where the disk will be mounted on the host OS. By default, Spotty will mount your disk somewhere in the /mnt/... directory (and then this directory will be mounted inside your container to /workspace as specified in the containers[].volumeMounts[].mountPath parameter). It's a bug though because it should work even if a custom mountDir is specified, so I'll leave this issue open until it's fixed.

@khumairraj you also have another issue in your config. Spotty expects that bash is installed inside the Docker image, if not - you won't be able to connect to the container. So, don't use the raw alpine image, but you can create a custom Dockerfile, inherit alpine and install bash on top.

@turian
Copy link

turian commented Nov 8, 2021

@apls777 I tried this but it doesn't work yet. I have spotty running from master.

Here is my latest spotty.yaml, it's as above but I removed mountDir:

# You must delete disks manually on GCP :\
# https://console.cloud.google.com/compute/disks?project=hear2021-evaluation

project:
  name: hearpreprocess
  syncFilters:
    - exclude:
        - .idea/*
        - .git/*
        - '*/__pycache__/*'
        - _workdir/*
        - tasks/*
        - .mypy_cache/*
        - hearpreprocess.egg-info/*

containers:
  - projectDir: /workspace/project
    image: turian/hearpreprocess
    volumeMounts:
      - name: workspace
        mountPath: /workspace
    runtimeParameters: ['--shm-size', '32G']

instances:
  - name: hearpreprocess-cpu-joseph
    provider: gcp
    parameters:
      zone: europe-west4-a
      machineType: c2-standard-16
      preemptibleInstance: False
      # gcloud compute images list
      # https://console.cloud.google.com/compute/images?project=hear2021-evaluation
      imageUri: projects/ubuntu-os-cloud/global/images/ubuntu-2004-focal-v20210825
      volumes:
        - name: workspace
          parameters:
            # Be careful to delete this if you're not using it!
            size: 2000
# Not implemented for GCP, all volumes will be retained
#            deletionPolicy: retain
#            mountDir: /workspace

scripts:
  clean: |
    bash clean.sh

After spotty sh it says that docker is not found.

I run spotty start -C and I get:

CommandException: arg (/mnt/hearpreprocess-hearpreprocess-cpu-joseph-workspace/project) does not name a directory, bucket, or bucket subdir.
If there is an object with the same path, please add a trailing
slash to specify the directory.
Connection to 34.91.169.30 closed.
Error:
------
Failed to download files from the bucket to the instance

Why? Note that I changed the instance name to make sure I have a fresh disk, as discussed in #108

@khumairraj
Copy link
Author

project:
  name: hearprep
  syncFilters:
    - exclude:
        - .idea/*
        - .git/*
        - '*/__pycache__/*'
        - _workdir/*
        - tasks/*
        - .mypy_cache/*
        - hearpreprocess.egg-info/*

containers:
  - projectDir: /workspace/project
    image: turian/hearpreprocess
    volumeMounts:
      - name: workspace
        mountPath: /workspace
    runtimeParameters: ['--shm-size', '32G']

instances:
  - name: hearprep-cpui2-delkhumair
    provider: gcp
    parameters:
      zone: europe-west4-a
      machineType: c2-standard-16
      preemptibleInstance: False
      imageUri: projects/ubuntu-os-cloud/global/images/ubuntu-2004-focal-v20210825
      volumes:
        - name: workspace
          parameters:
            size: 1000

I also tried the above config and get the below error -

Creating disks...
  - disk "hearprep-hearprep-cpui2-delkhumair-workspace" was created

Preparing the deployment template...
  - image URL: projects/ubuntu-os-cloud/global/images/ubuntu-2004-focal-v20210825
  - zone: europe-west4-a
  - on-demand VM
  - no GPUs

Volumes:
+-----------+------------+------+-----------------+
| Name      | Mount Path | Type | Deletion Policy |
+===========+============+======+=================+
| workspace | /workspace | Disk | Retain Volume   |
+-----------+------------+------+-----------------+

Waiting for the stack to be created...
  - launching the instance...
  - running the Docker container...
  Error:
  ------
  Deployment "spotty-instance-hearprep-hearprep-cpui2-delkhumair" failed.
  Error: {"ResourceType":"runtimeconfig.v1beta1.waiter","ResourceErrorCode":"412","ResourceErrorMessage":"Failure condition satisfied."}

Is there something I am missing? Thanks again for all your help

@apls777
Copy link
Collaborator

apls777 commented Nov 15, 2021

@khumairraj Most likely, it's an issue with the GCP image. Try to update it to the latest one - see my reply here.

@turian
Copy link

turian commented Nov 21, 2021

@apls777 I upgraded to the latest Ubuntu image but it doesn't have docker by default, like the deep learning images usually do:

# You must delete disks manually on GCP :\
# https://console.cloud.google.com/compute/disks?project=hear2021-evaluation

project:
  name: hearpreprocess
  syncFilters:
    - exclude:
        - .idea/*
        - .git/*
        - '*/__pycache__/*'
        - _workdir/*
        - tasks/*
        - .mypy_cache/*
        - hearpreprocess.egg-info/*

containers:
  - projectDir: /workspace/project
    image: turian/hearpreprocess
    volumeMounts:
      - name: workspace
        mountPath: /workspace
    runtimeParameters: ['--shm-size', '32G']

instances:
  - name: hearpreprocess-cpu-joseph
    provider: gcp
    parameters:
      zone: europe-west4-a
      #machineType: c2-standard-16
      #machineType: e2-standard-32
      machineType: c2-standard-60
      preemptibleInstance: False
      # gcloud compute images list
      # https://console.cloud.google.com/compute/images?project=hear2021-evaluation
      imageUri: projects/ubuntu-os-cloud/global/images/ubuntu-2004-focal-v20211118
      volumes:
        - name: workspace
          parameters:
            size: 5000

scripts:
  clean: |
    bash clean.sh

Then spotty sh -H, then execute cat /var/log/startup-script.log

bash: docker: command not found
Container is not running.
Use the "spotty start -C" command to start it.

If instead I switch the image to the latest CPU deeplearning image:

      imageUri: projects/ml-images/global/images/c0-deeplearning-common-cpu-v20211118-debian-10

I get the following weird log error in /var/log/startup-script.log:

0 upgraded, 0 newly installed, 0 to remove and 5 not upgraded.
+ echo 'bind-key x kill-pane'
++ dirname /tmp/spotty/instance/scripts/container_bash.sh
+ mkdir -p /tmp/spotty/instance/scripts
+ cat
+ chmod +x /tmp/spotty/instance/scripts/container_bash.sh
+ CONTAINER_BASH_ALIAS=container
+ echo 'alias container="/tmp/spotty/instance/scripts/container_bash.sh"'
+ echo 'alias container="/tmp/spotty/instance/scripts/container_bash.sh"'
+ mkdir -pm 777 /tmp/spotty
+ mkdir -pm 777 /tmp/spotty/containers
+ /tmp/spotty/instance/scripts/startup/02_mount_volumes.sh
+ DEVICE_NAMES=("disk-1")
+ MOUNT_DIRS=("/mnt/hearpreprocess-hearpreprocess-cpu-joseph-workspace")
+ for i in ${!DEVICE_NAMES[*]}
+ DEVICE=/dev/disk/by-id/google-disk-1
+ MOUNT_DIR=/mnt/hearpreprocess-hearpreprocess-cpu-joseph-workspace
+ blkid -o value -s TYPE /dev/disk/by-id/google-disk-1
+ mkfs -t ext4 /dev/disk/by-id/google-disk-1
mke2fs 1.44.5 (15-Dec-2018)
/dev/disk/by-id/google-disk-1 is apparently in use by the system; will not make a filesystem here!

and spotty start -C gives:

CommandException: arg (/mnt/hearpreprocess-hearpreprocess-cpu-joseph-workspace/project) does not name a directory, bucket, or bucket subdir.
If there is an object with the same path, please add a trailing
slash to specify the directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants