Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECS circuit breaker (question) #161

Open
stadskle opened this issue Feb 24, 2021 · 4 comments
Open

ECS circuit breaker (question) #161

stadskle opened this issue Feb 24, 2021 · 4 comments

Comments

@stadskle
Copy link

Has anyone tested this tool with the new circuit breaker?

https://aws.amazon.com/blogs/containers/announcing-amazon-ecs-deployment-circuit-breaker/

It is on our plan, but have not manged to do it yet, so just wanted to hear if anyone have tried. I suspect it might require some changes in the deploy script, to ensure that ecs deploy reports failed deploy correctly if ECS aborts it?

@fabfuel
Copy link
Owner

fabfuel commented Feb 25, 2021

Hi @stadskle,

I have not tested a deployment with ecs-deploy and an active circuit breaker configuration yet.

My gut feeling is, that it should not cause problems, as ecs-deploy only fetches the deployment state from ECS and waits until it changed to Completed, if this never happens, ecs-deploy will report the deployment as failed. But I will follow up and share my findings here.

Best
Fabian

@stadskle
Copy link
Author

Yes, I guess the ecs deploy will time out, and report as failed as it does today. But it will not report Failed the moment the breaker kicks in I guess? Example here from the blog post:

image

@fabfuel
Copy link
Owner

fabfuel commented Feb 27, 2021

Yes, for now only a timeout would be reported if the whole process takes too long. Currently the check of the Deployment entity is not as explicit as it could be. I'll look into this a bit deeper with the goal to report explicitly when the deployment failed, independent if the circuit breaker is activated or not - at the time, when the original deployment failed (before an optional rollback).

During my tests I discovered, that this new feature does not cover all cases of failing containers. For example, if you specify an invalid Docker CMD (what I did for a quick fix), this is not covered by the circuit breaker, the deployment will still retry forever in this case.

This is a known limitation, it's covered in this issue: aws/containers-roadmap#1206

Best
Fabian

@fabfuel
Copy link
Owner

fabfuel commented Mar 3, 2021

The deployment check now utilizes the new rolloutState property of the ECS deployment entity. So far the identification, if a deployment finished needed to be done based on the number of stably running tasks of the expected task definition.

With this change, we can utilize the new circuit breaker feature and:

  1. monitor the number of failed tasks during a deployment and
  2. alert if the deployment has failed and the circuit breaker kicked it

Screenshot 2021-03-03 at 13 15 37

The feature is not released yet, but available in a feature branch for now, if anybody wants to chime it and test this new behavior.

Best
Fabian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants