Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Revisit docker restart policy #1734

Closed
ryan-bever opened this issue Dec 1, 2015 · 47 comments
Closed

[RFC] Revisit docker restart policy #1734

ryan-bever opened this issue Dec 1, 2015 · 47 comments

Comments

@ryan-bever
Copy link
Contributor

ryan-bever commented Dec 1, 2015

RFC from @josegonzalez


I think the way to go here is to set a config var like DOKKU_RESTART_POLICY. This variable could contain the exact restart policy being applied to an application. An alternative is to use the docker-options plugin.

Valid restart policies would match the following bash regular expression:

^(no|unless-stopped|always|on-failure(:[1-9][0-9]*)?)$

If you unset that config variable - or if it did not yet exist - it would be reset to the initial default of on-failure:10 Removing the docker-options entry would simply remove all restart-policies. Restart policies would apply to all processes being managed.

Potential porcelain around the plumbing:

# show the restart policy
dokku ps:restart-policy APP

# set the restart policy for a given application
dokku ps:set-restart-policy APP on-failure:10

Original post


Is it time to look into the restart policy again with docker 1.9.x out? Add a new unless-stopped restart policy (#15348) was added in 1.9.0

See:
#1591
#1505

Repasting my message from #1591 (comment) below:

OK, I was able to test out the docker-option --restart=unless-stopped. Based on the testing that I have performed this behaves as I would desire.

Below is a summary of my testing:
I have a test-app deployed with DOKKU_SCALE=2

    dokku docker-options:add test-app deploy --restart=unless-stopped
    sudo reboot

After setting the restart option following any reboot I would see the following behavior:

  • The initial docker ps shows 4 containers up.
  • After a couple of minutes it shows only the 2 expected containers up.

docker inspect showed for both containers:

"RestartPolicy": {
            "Name": "unless-stopped",
            "MaximumRetryCount": 0
        }

After causing the app to crash (process.exit(1)) each container twice:
docker inspect showed for both containers:

"RestartCount": 2

Then sudo reboot

  • See reboot behavior description above.

Execute dokku ps:restart test-app 3 Times in quick succession:

  • docker ps Shows 8 containers up for 1-2 minutes
  • Then back down to expected 2

Then sudo reboot

  • See reboot behavior description above.

Then dokku ps:stop test-app

  • docker ps shows no containers running

Then sudo reboot

  • This time only two containers come up.
  • This is the same as doing a ps:stop and reboot with an app that does not have the restart policy.

Push new version of the test-app

  • Behaves the same as the reboot behavior description above.
  • docker inspect shows that the RestartPolicy is retained for both containers.

Test crash the test-app

  • the containers are still restarted

Then sudo reboot

  • Same reboot behavior described above
@ryan-bever
Copy link
Contributor Author

@josegonzalez mentioned "What happens on memory-constrained systems? Can you try this on a box with 512mb ram and 512mb swap?" in #1591 (comment)

Let me see if I can get that running

@michaelshobbs
Copy link
Member

Are we certain this is necessary with the $DOKKU_RESTORE functionality that was recently added?

Reference: #1613 #1735

@ryan-bever
Copy link
Contributor Author

Using the --restart policy allows containers that have crashed/exited i.e. because of an application bug to be restarted by docker without a reboot of the server. Does #1613 provide that?

@michaelshobbs
Copy link
Member

oh yeah forgot about that 😄

@ryan-bever
Copy link
Contributor Author

Of course that would never happen to one of my apps 😉

@ryan-bever
Copy link
Contributor Author

I tried this on a new VirtualBox VM where after going through the above exercise top shows:

KiB Mem:    500972 total,   271812 used,   229160 free,    13852 buffers
KiB Swap:   520188 total,    32568 used,   487620 free.    86396 cached Mem

During the ps:restart in quick succession portion I did see the memory usage bump up to:

KiB Mem:    500972 total,   494648 used,     6324 free,     3852 buffers
KiB Swap:   520188 total,    93064 used,   427124 free.    58064 cached Mem

But it came back down after the previous containers were killed.

@josegonzalez what were you expecting from this exercise?

@josegonzalez
Copy link
Member

My only concern is that this would potentially go haywire on systems with low amounts of memory.

@ryan-bever
Copy link
Contributor Author

Do we want to add a restart policy by default when running containers? It would require docker 1.9.x.

Or maybe alternatively, if we don't want to make docker 1.9.x a requirement, provide a application level Configuration option where a user can specify a default restart policy to be applied to new applications, that could default to no --restart docker option applied.

@josegonzalez
Copy link
Member

@beverku We can make 1.9.x a requirement in 0.5.0.

@ryan-bever
Copy link
Contributor Author

@josegonzalez, Yes that requirement would work for me.

@josegonzalez
Copy link
Member

@michaelshobbs anything we need to worry about with re-enabling this?

@beverku does this feature have an "analog" for configuration in heroku?

@josegonzalez
Copy link
Member

@beverku want to make this change for us as a pr?

@ryan-bever
Copy link
Contributor Author

I don't know, I don't actually use Heroku, but I'll poke around a bit and see if I can see it. I'll start working on a PR. It's going to be later in the week before I can start working on it though.

@christiangenco
Copy link
Contributor

What's the current best practice for auto-restarting crashed processes? Should I run that dokku docker-options:add test-app deploy --restart=unless-stopped manually?

I've got an app in production that's getting its workers killed every once in a while, and I'm not sure the best course of action.

@josegonzalez
Copy link
Member

@christiangenco thats probably what you want, though I don't know how that performs with server reboots.

@christiangenco
Copy link
Contributor

@josegonzalez Neat - I'll try it. Thank you!

@ryan-bever
Copy link
Contributor Author

@christiangenco, that is exactly what I am doing now, until I can submit the PR referenced above. In my case it works perfectly with server reboot.

@christiangenco
Copy link
Contributor

@beverku The node I was testing it on just crashed my worker again without restarting it. Do I need to do more than dokku docker-options:add test-app deploy --restart=unless-stopped? Is sudo reboot necessary too?

@josegonzalez
Copy link
Member

@beverku what do you mean by "crashed my worker"?

@christiangenco
Copy link
Contributor

@josegonzalez My worker process (which is a sidekiq worker defined in my procfile) died and didn't restart, even though I ran dokku docker-options:add test-app deploy --restart=unless-stopped (with my app name instead of test-app). I didn't reboot the machine after I ran that docker-options command.

I'm not sure how to check the logs to see why it crashed - this command:

docker logs -f `cat CONTAINER.worker.1`

just shows a gigantic stream of what looks like normal output.

@josegonzalez
Copy link
Member

What is the output of:

dokku docker-options:add test-app

@josegonzalez
Copy link
Member

It might be that --restart=unless-stopped doesnt restart on normal exit, and your process exited normally (that is, with exit code of zero).

@christiangenco
Copy link
Contributor

@josegonzalez The output of dokku docker-options:add dbinbox is blank.

It's entirely possible sidekiq killed itself without an exit code - I still can't track down why it's crashing.

@josegonzalez
Copy link
Member

err sorry, not add. Drop that and run the command.

@christiangenco
Copy link
Contributor

$ dokku docker-options dbinbox
Deploy options:
    --restart=unless-stopped
$

@michaelshobbs
Copy link
Member

@christiangenco Saw your addition to #1878, did the docker-options method work for you?

@josegonzalez
Copy link
Member

This is probably going to have to wait until 0.6.0 because no one tested the conditions I wanted them to test :(

@christiangenco
Copy link
Contributor

@michaelshobbs Even with the docker-options set the worker processes still crash without resuming :/

@michaelshobbs
Copy link
Member

Seems like your good to go, yea?

@christiangenco
Copy link
Contributor

@michaelshobbs seems like it so far! If all goes well I expect my d1 worker process to eventually die under heavy load but my d2 process to keep going. If that happens, I'll know that restart=unless-stopped made the difference and will apply that to d1.

Thanks for your help!

@michaelshobbs
Copy link
Member

When using docker-options:add <app> --restart=unless-stopped with docker-app.json pre/post-deploy scripts, the pre & post-deploy containers continually restart. I'm thinking we'll just attach the policy on the deployed container inside of dokku instead.

Furthermore, I think unless-stopped is the right policy by default. However, can anyone think of a reason one might want a different policy? Seems to me that could wreak havoc on a system. Additionally, if we use the unless-stopped policy, I believe dokku would have to pin to docker 1.10+. Am I wrong there?
https://docs.docker.com/engine/reference/run/#restart-policies-restart

@ryan-bever
Copy link
Contributor Author

I'm sorry this completely fell off my radar, do you still want me to work on a PR or are you working on it @michaelshobbs ? I believe you have to pin to docker 1.9+

@alanjds
Copy link
Contributor

alanjds commented Mar 31, 2016

@beverku I do use Heroku. When an app crashes, it restarts it back, but if it crashes "too often" then is let crashed and mail if kept crashed for some time (hours, I guess).

For "too often" I feel it like "5 times per minute or 10 per hour, whatever comes first", but had not to find it at the docs right now.

This is a good compromise, as I don't want a broken container to be looped consuming bootup resources.

@ryan-bever
Copy link
Contributor Author

@alanjds what you describe is closer to the on-failure[:max-retries] docker restart policy.

But it seems like this is handled in a nice manner by docker in unless-stopped by:

"An ever increasing delay (double the previous delay, starting at 100 milliseconds) is added before each restart to prevent flooding the server. This means the daemon will wait for 100 ms, then 200 ms, 400, 800, 1600, and so on until either the on-failure limit is hit, or when you docker stop or docker rm -f the container."

see: https://docs.docker.com/engine/reference/run/#restart-policies-restart

@michaelshobbs
Copy link
Member

@beverku I'll put something together. should be a small change

@michaelshobbs
Copy link
Member

Ok thinking through this now.

Given that we auto restore all apps on boot, I'm thinking the --restart=on-failure:10 arg should be passed to only the deployed containers. I'd like to go with a sane setting here and not expose the ability to change.

Thoughts on this?

@josegonzalez
Copy link
Member

Sounds fine to me. We can make policies configurable in a future release.

@christiangenco
Copy link
Contributor

@epixa: @christiangenco Have you encountered any crashes since your comment?

Nothing's crashed so far, and d2$ docker inspect dbinbox.worker.1 shows "RestartCount": 16, so that appears to be working.

Strangely, I haven't changed any settings on d1 yet, and that hasn't crashed either. My hunch is that d1 only crashes if... it gets overloaded by d2 crashing? Not sure why only d2 would be crashing, though.

@paulwalker
Copy link

Is there any method by which I can chain restarts? One app will crash if another is not running, etc. Has anyone handled this use case before?

@josegonzalez
Copy link
Member

@paulwalker What do you mean, one app will crash if another is not running?

@paulwalker
Copy link

@josegonzalez my app will exit with an error if it cannot successfully connect to another app running mongodb (configured via the dokku mongodb plugin).

@josegonzalez
Copy link
Member

Can you hop on slack/irc so we can chat about this? I assume you're trying to test the above solution, correct?

@paulwalker
Copy link

I haven't tested anything yet, just researching the proper way to have everything start up again after system reboot. submitted for invite on slack...

@josegonzalez josegonzalez modified the milestones: v0.6.0, v0.7.0 Jun 23, 2016
@josegonzalez
Copy link
Member

josegonzalez commented Jun 24, 2016

I think the way to go here is to set a config var like DOKKU_RESTART_POLICY. This variable could contain the exact restart policy being applied to an application. An alternative is to use the docker-options plugin.

Why not just hook into the existing docker-options plugin like we do for storage? Because we need a default.

Valid restart policies would match the following bash regular expression:

^(no|unless-stopped|always|on-failure(:[1-9][0-9]*)?)$

If you unset that config variable - or if it did not yet exist - it would be reset to the initial default of on-failure:10. Removing the docker-options entry would simply remove all restart-policies. Restart policies would apply to all processes being managed.

Potential porcelain around the plumbing:

# show the restart policy
dokku ps:restart-policy APP

# set the restart policy for a given application
dokku ps:restart-policy APP on-failure:10

@michaelshobbs thoughts?

@josegonzalez josegonzalez changed the title Revisit docker restart policy [RFC] Revisit docker restart policy Jun 25, 2016
@josegonzalez josegonzalez self-assigned this Jul 2, 2016
josegonzalez added a commit that referenced this issue Jul 2, 2016
Applications without a restart-policy will have their policies set to `on-failure:10`. Users can completely unset the restart-policy using the `docker-options` plugin, though this will be equivalent to setting it explicitly to `no`.

Restart policies must be explicitly set, and the following are all valid:

- no
- unless-stopped
- always
- on-failure
- on-failure:NUMBER
@josegonzalez
Copy link
Member

Closing as there is a pull request (#2290) up.

josegonzalez added a commit that referenced this issue Jul 2, 2016
Applications without a restart-policy will have their policies set to `on-failure:10`. Users can completely unset the restart-policy using the `docker-options` plugin, though this will be equivalent to setting it explicitly to `no`.

Restart policies must be explicitly set, and the following are all valid:

- no
- unless-stopped
- always
- on-failure
- on-failure:NUMBER
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants