Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support for using spot instances #171

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

troinine
Copy link

Closes #169

This PR adds an optional variable to request EC2 Spot instances which are significantly more cost-effective for running non-critical workloads compared on-demand pricing.

The default functionality remains the same. If you do not define market-type, you get on-demand instances. Tested to work on an actual AWS account:

Screenshot 2023-11-24 at 11 24 30

Start:

Run troinine/ec2-github-runner@feature/support-spot-instances
  with:
    mode: start
    github-token: -- redacted --
    ec2-image-id: -- redacted --
    ec2-instance-type: t3a.large
    subnet-id: -- redacted --
    security-group-id: -- redacted --
    market-type: spot
    aws-resource-tags: -- redacted --
  
GitHub Registration Token is received
AWS EC2 instance i-0b010... is started
AWS EC2 instance i-0b010... is up and running

Stop:

Run troinine/ec2-github-runner@feature/support-spot-instances
  with:
    mode: stop
    github-token: -- redacted --
    label: -- redacted --
    ec2-instance-id: i-0b010...
    aws-resource-tags: -- redacted --

AWS EC2 instance i-0b010... is terminated

P.S. I have some future improvement ideas like reverting to on-demand capacity if the spot request cannot be fulfilled but I wanted to keep it simple for now, especially since there are no tests in the project (which are another area of improvement).

@tknisch
Copy link

tknisch commented Nov 24, 2023

I've tested the new feature and it worked fine:
image

@troinine
Copy link
Author

@machulav, kindly requesting your opinion on this. Would it be possible to merge and release this feature?

@Dmitry1987
Copy link

great work! spot requests are really missing for the one-off builds, thanks for working on this @troinine . I think actions like this one should be adopted by aws official or by github official repos 🧐 because it's kind of a must have functionality for either of these companies.

@Dmitry1987
Copy link

What about the bid price though, is it automatic on aws side when that function is called, and uses the current market price? Has a bit more chance of termination for long builds then, isn't it? if ec2 gets quickly outbid by other requests.

@troinine
Copy link
Author

troinine commented Dec 7, 2023

What about the bid price though, is it automatic on aws side when that function is called, and uses the current market price? Has a bit more chance of termination for long builds then, isn't it? if ec2 gets quickly outbid by other requests.

Good point. Initially, I allowed for a configurable bid price, but after considering some experiences and reviewing the AWS documentation, I reverted to using defaults. Not defining a bid price gives the best chance of running the instance uninterrupted.

According to the RunInstances API documentation:

We do not recommend using this parameter because it can lead to increased interruptions.

@Dmitry1987
Copy link

I wonder if the linter status is a blocker for it to get merged 🧐 can i help somehow push it through? 😅 i want to use the main release so waiting for it to get merged. btw there's this "cpu options" to make "threads_per_core=1", it seems not available in current options. maybe we should allow an array of all possible aws sdk options to be passed as a whole?
for example instead of adding options and mapping them to a new yaml field, like the spot and market

      MarketType: config.input.marketType,
      SpotOptions: {
        SpotInstanceType: 'one-time',
      },

it could be a good option to let the user chose any 'input' fields to append, like 'extraOptions:' and whatever goes there. because there are so many options to the ec2 spawn operation. storage mapping, networking stuff, launch template to use, resource group, etc. what do you think?

@Dmitry1987
Copy link

Hi all hope you're having a good new year 🎉
I merged this to my fork and noticed that runner won't connect. How do we troubleshoot when this happens? there are not much logs and I have no idea what processes are running... it's the same exact AMI that is working with vanilla release of the action, so one of the two PRs I merged are affecting something, but I don't know how to approach debugging this 🤔

Building data script for linux
AWS EC2 instance i-xxxxx is started
AWS EC2 instance i-xxxxx is up and running
Waiting [30]s for the AWS EC2 instance to be registered in GitHub as a new self-hosted runner
Checking every 10s if the GitHub self-hosted runner is registered
Checking...
Checking...
Checking...
Checking...
Error: GitHub self-hosted runner registration error
Checking...
Error: A timeout of 5 minutes is exceeded. Your AWS EC2 instance was not able to register itself in GitHub as a new self-hosted runner.

@tknisch
Copy link

tknisch commented Jan 4, 2024

Hi all hope you're having a good new year 🎉 I merged this to my fork and noticed that runner won't connect. How do we troubleshoot when this happens? there are not much logs and I have no idea what processes are running... it's the same exact AMI that is working with vanilla release of the action, so one of the two PRs I merged are affecting something, but I don't know how to approach debugging this 🤔

Building data script for linux
AWS EC2 instance i-xxxxx is started
AWS EC2 instance i-xxxxx is up and running
Waiting [30]s for the AWS EC2 instance to be registered in GitHub as a new self-hosted runner
Checking every 10s if the GitHub self-hosted runner is registered
Checking...
Checking...
Checking...
Checking...
Error: GitHub self-hosted runner registration error
Checking...
Error: A timeout of 5 minutes is exceeded. Your AWS EC2 instance was not able to register itself in GitHub as a new self-hosted runner.

You could activate the Instance Termination Protection, right after the instance has launched. This will prohibit the action from terminating the instance. Then you will have enough time to connect to the instance (e.g. via SSH oder SSM agent) and have a look if the GitHub Runner is active

@Dmitry1987
Copy link

basically user-data script had syntax errors, because I merged a mix of windows support and spots, while making a mess in that js file 😁 works nicely. i hope this gets merged.

@mission-jseverance
Copy link

Any idea when this could get merged?

@Preen
Copy link

Preen commented Mar 11, 2024

+1

@hassanazharkhan
Copy link

@machulav Could you please prioritise your review on this one pls? it will be a great addition! 🙏🏻

@crohr
Copy link

crohr commented May 23, 2024

I'm not sure whether this project is still maintained, but for anyone interested I recently launched RunsOn, which supports spot instances, arm64, compatible images, etc. and is backed by many clients so it won't go away anytime soon.

This project was an inspiration for that work, so hopefully it continues to live since it fills a specific use-case well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature request] Allow requesting spot instance instead of an on-demand instance.
7 participants