Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful restart with tableflip #787

Open
fxedel opened this issue Oct 2, 2023 · 2 comments
Open

Graceful restart with tableflip #787

fxedel opened this issue Oct 2, 2023 · 2 comments
Labels

Comments

@fxedel
Copy link

fxedel commented Oct 2, 2023

Version of KrakenD you are using

KrakenD Version: 2.4.3
Go Version: 1.20.6
Glibc Version: MUSL-1.2.4_(alpine-3.18.2)

Is your feature request related to a problem? Please describe.
Restarting krakend always comes with a short downtime on that machine, as the old process is shutting down, thus closing the HTTP listen socket, and then a new process is starting up, doing some initializing and only then starts listening. Usually, high availability for an API gateway is desired.

Describe the solution you'd like
Implement graceful restart via cloudflare/tableflip. The restart works like so:

  • Send signal to current (old) krakend process (or use any other kind of notifying the process to restart)
  • Old krakend process spawns a new krakend process and passes its HTTP listen socket as a file descriptor to the new process. The old process is still running and serving HTTP requests.
  • New process is starting up, doing some initialization. Finally, it uses the listen socket passed as a file descriptor, to start serving HTTP requests
  • For a very small period of time, both processes are now serving requests
  • The new process signals the old process that it has finished initialization and is ready to serve requests
  • The old process shuts down.

If the new process fails during initialization, such as panicking due to an invalid config file, or exceeding a configurable startup timeout, the old process won't shut down and still serves requests. Therefore, it's ensured that at any time, there is a usable krakend process running.

This graceful restart strategy is in fact inspired by nginx reloads, see Cloudflare's blogpost.

Describe alternatives you've considered
The documentation recommends using blue/green deployments. While this can be straightforward in a Kubernetes or Cloud setup, it might not be usable in all situations. Having a simple builtin graceful restart functionality, just like nginx, makes it possible to update the configuration with zero downtime and without changing anything in the server infrastructure. I would consider this as an alternative restart option, so we have different options that are more or less suited for different setups.

Copy link

This issue is marked as stale because it has been open over 90 days with no activity. Remove the stale label or comment or this will be closed in 15 days.

@github-actions github-actions bot added the stale label Jan 15, 2024
@alombarte alombarte removed the stale label Jan 16, 2024
Copy link

This issue is marked as stale because it has been open over 90 days with no activity. Remove the stale label or comment or this will be closed in 15 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants