Graceful restart with tableflip #787

fxedel · 2023-10-02T08:48:39Z

Version of KrakenD you are using

KrakenD Version: 2.4.3
Go Version: 1.20.6
Glibc Version: MUSL-1.2.4_(alpine-3.18.2)

Is your feature request related to a problem? Please describe.
Restarting krakend always comes with a short downtime on that machine, as the old process is shutting down, thus closing the HTTP listen socket, and then a new process is starting up, doing some initializing and only then starts listening. Usually, high availability for an API gateway is desired.

Describe the solution you'd like
Implement graceful restart via cloudflare/tableflip. The restart works like so:

Send signal to current (old) krakend process (or use any other kind of notifying the process to restart)
Old krakend process spawns a new krakend process and passes its HTTP listen socket as a file descriptor to the new process. The old process is still running and serving HTTP requests.
New process is starting up, doing some initialization. Finally, it uses the listen socket passed as a file descriptor, to start serving HTTP requests
For a very small period of time, both processes are now serving requests
The new process signals the old process that it has finished initialization and is ready to serve requests
The old process shuts down.

If the new process fails during initialization, such as panicking due to an invalid config file, or exceeding a configurable startup timeout, the old process won't shut down and still serves requests. Therefore, it's ensured that at any time, there is a usable krakend process running.

This graceful restart strategy is in fact inspired by nginx reloads, see Cloudflare's blogpost.

Describe alternatives you've considered
The documentation recommends using blue/green deployments. While this can be straightforward in a Kubernetes or Cloud setup, it might not be usable in all situations. Having a simple builtin graceful restart functionality, just like nginx, makes it possible to update the configuration with zero downtime and without changing anything in the server infrastructure. I would consider this as an alternative restart option, so we have different options that are more or less suited for different setups.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-01-15T10:20:19Z

This issue is marked as stale because it has been open over 90 days with no activity. Remove the stale label or comment or this will be closed in 15 days.

github-actions · 2024-04-16T10:18:07Z

This issue is marked as stale because it has been open over 90 days with no activity. Remove the stale label or comment or this will be closed in 15 days.

github-actions bot added the stale label Jan 15, 2024

alombarte removed the stale label Jan 16, 2024

github-actions bot added the stale label Apr 16, 2024

obokaman-com added roadmap and removed stale labels Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graceful restart with tableflip #787

Graceful restart with tableflip #787

fxedel commented Oct 2, 2023 •

edited

github-actions bot commented Jan 15, 2024

github-actions bot commented Apr 16, 2024

Graceful restart with tableflip #787

Graceful restart with tableflip #787

Comments

fxedel commented Oct 2, 2023 • edited

github-actions bot commented Jan 15, 2024

github-actions bot commented Apr 16, 2024

fxedel commented Oct 2, 2023 •

edited