Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support passing FDs (socket activation) #6296

Open
flokli opened this issue May 3, 2024 · 21 comments
Open

Support passing FDs (socket activation) #6296

flokli opened this issue May 3, 2024 · 21 comments
Labels
discussion 💬 The right solution needs to be found feature ⚙️ New feature or request

Comments

@flokli
Copy link

flokli commented May 3, 2024

I'd like to use caddy in a socket-activated environments, using FDs passed down from the service manager, rather than binding on addresses on its own.

Combined with signalling readyness (which caddy already does), this will give zero-downtime (re)deployments on Linux systems using systemd (if .socket files are used), by simply restarting the process - the socket is held open by systemd, and new connections are passed in once caddy is ready to accept new requests. In these cases, there wouldn't be a need for complicated reload logic anymore.

github.com/coreos/go-systemd/activation provides the necessary methods to check whether FDs are passed, including identifying them by their socket name. https://vincent.bernat.ch/en/blog/2018-systemd-golang-socket-activation gives a nice introduction into the feature itself.

In case no explicit listen addresses are specified, caddy could default to do that rather than binding on its own, if it detects it's running in such an environment.
Additionally, Caddyfile could be extended to allow specifying these passed fds as network addresses (something like sd-listen:$name or sd-listen:$idx maybe). This can become useful when you want to expose different things on different sockets.

@francislavoie
Copy link
Member

francislavoie commented May 3, 2024

Are you looking for the bind directive? https://caddyserver.com/docs/caddyfile/directives/bind

And see https://caddyserver.com/docs/conventions#network-addresses, you can use unix sockets in reverse_proxy upstreams.

I'm not sure what you're asking for if not that.

@mholt
Copy link
Member

mholt commented May 3, 2024

It sounds like what is being asked for is graceful upgrades/restarts.

Caddy 1 had this feature, and I quite liked how it worked: pass the socket directly to the next process. It worked on all Unix systems without relying on a separate system service, and it was smart enough to understand Caddy configuration: if the new config didn't use a socket, it wouldn't be kept; rather than blindly moving all the sockets over.

I'd probably rather bring the implementation from Caddy 1 into Caddy 2.

@mholt mholt added feature ⚙️ New feature or request discussion 💬 The right solution needs to be found labels May 3, 2024
@flokli
Copy link
Author

flokli commented May 3, 2024

It sounds like what is being asked for is graceful upgrades/restarts.

No, getting this for free is only one side-effect of supporting socket-activation.

socket-activation will also cause caddy to get started lazily whenever the first connection to the (externally configured) socket address happens, which simplifies declaring service dependencies too.

The article linked from my link elaborates a bit more on this.

Caddy 1 had this feature, and I quite liked how it worked: pass the socket directly to the next process.

This still requires caddy to do manual coordination with its new process and pass it around explicitly. The point of simply taking the FDs passed by the service manager is that caddy does not have to be aware of whether it's the first process being started on the system, or you start a new version with another config. caddy simply gets an FD, where new connections will appear on.

@flokli
Copy link
Author

flokli commented May 3, 2024

Ah yes, and because caddy just takes FDs, it doesn't need to bind() on its own, which allows applying stronger sandboxing from the outside.

@mholt
Copy link
Member

mholt commented May 3, 2024

@flokli

This still requires caddy to do manual coordination with its new process and pass it around explicitly. The point of simply taking the FDs passed by the service manager is that caddy does not have to be aware of whether it's the first process being started on the system, or you start a new version with another config. caddy simply gets an FD, where new connections will appear on.

But what is Caddy supposed to do with that socket? How does it know the configuration associated with it? You can't just hand a server a socket and expect it to know what to do with it, without any configuration... maybe I am missing something about how it works.

@flokli
Copy link
Author

flokli commented May 3, 2024

Sockets can have names attached (so the user can name them http and https for example, or api and metrics), and we could add a syntax to refer to them via these names in Caddyfile. I could say I want a http server on sd-listen:http, which would then expect a listener named http to be passed to caddy.

All these passed FDs also give you a net.Listener interface, so even without explicit config caddy could still check the properties of it and apply some heuristics too (detect port 80 and 443 if you got two unnamed TCP sockets), if we want to apply some out-of-the-box behaviour in these scenarios. But getting the basic support for it (using an externally-passed FD by its name/index) and defining the syntax for it would be a nice first step.

You can play around with this through systemd-socket-activate -l 8088 -l 8089 --fdname=foo:bar -- /path/to-caddy, which will give you two TCP sockets listening on the two ports, named foo and bar.

@mholt
Copy link
Member

mholt commented May 3, 2024

Oh I see, so you'd still have your Caddy config, you'd just specify a different network name for the listener address, and Caddy will then get it from the service manager rather than binding a new socket.

@flokli
Copy link
Author

flokli commented May 3, 2024

Yes! Or well, I don't want caddy to do any bind on its own at all, but pass in every socket via this mechanism.

@mholt
Copy link
Member

mholt commented May 3, 2024

In that case you can use bind in your site blocks to get the socket from the service manager. We'd just need to implement a package that calls caddy.RegisterNetwork(). For example the caddy-tailscale package does this so that Tailscale can provide a listener.

Anyone is welcome to pick this up.

@WeidiDeng
Copy link
Member

@mholt I did some experiments with registering custom network, it's too much trouble to be worth it. Every site block needs an explicit bind and that includes http port and http3 udp socket.

@flokli I'm thinking on unix, we can try preferring socket activation but fallback to the old behavior. What do you think of it? Or should caddy just exit unsuccessfully if socket activation environments variables are found but not sockets matching listening critertia are found? Or if some warning logs are emitted?

As mentioned above, you are responsible to pass every socket yourself, including 80 tcp and 443 udp if auto http->https and http3 are enabled respectively. And admin socket if enabled as well. Assuming you restart caddy instead of reload it.

@climba03003
Copy link

I would really see it happen and it can greatly reduce my network stack complexity.
Currently, I am have two caddy in front of server and I face a lot of instability because of podman networking.
I change to using socket to see if it works better (no more DNS resolution).

flowchart TD
    A[Caddy] -->|Reverse Proxy| B{Container Network}
    B -->|Serve Frontend| C[Caddy]
    C -->|Reverse Proxy| D[Server]

When socket activation becomes a thing, it can also reduce resources usage. Because the middle caddy can be terminated when no one connected for some time. If the outer one can be socket activated, it will directly pass the socket to inner one and benefit of direct network connection.

@flokli
Copy link
Author

flokli commented May 6, 2024

@mholt I did some experiments with registering custom network, it's too much trouble to be worth it. Every site block needs an explicit bind and that includes http port and http3 udp socket.

@flokli I'm thinking on unix, we can try preferring socket activation but fallback to the old behavior. What do you think of it? Or should caddy just exit unsuccessfully if socket activation environments variables are found but not sockets matching listening critertia are found? Or if some warning logs are emitted?

I think ti makes sense to first land the feature with explicit configuration, which might mean explicit bind statements, and once that's in, think about having more opinionated defaults in case we are in a socket-activated environment.

The good thing is, it's pretty safe to detect whether caddy is running in a socket-activated environment or not, so we are able to change defaults in this case, without breaking existing usecases.

@WeidiDeng
Copy link
Member

@flokli So that means you're fine with mixing passing FD and current binding behavior? And since you will use bind explicitly, it's an error to bind to an non existent FD.

The problem with names is that one name can map to many sockets with different addresses, how do you think caddy handle this situation?

@eliasp
Copy link

eliasp commented May 6, 2024

Until this is implemented: for those that just care about binding to ports <1024 AND not running Caddy as root, can use systemd's SocketBindAllow= (available since systemd 249)

@mohammed90
Copy link
Member

for those that just care about binding to ports <1024 AND not running Caddy as root,

There was never a need to run Caddy as root on Linux. Our standard systemd unit file is shipped with CAP_NET_BIND_SERVICE which allows the service to run without root. The SocketBindAllow and SocketBindDeny allows further restriction to specific ports rather than any port below 1024.

@flokli
Copy link
Author

flokli commented May 8, 2024

I'm aware of CAP_NET_BIND_SERVICE to allow non-root processes to bind to lower ports, that's not why I'm advocating for this feature.

Giving the option to move the whole socket binding business entirely out of caddy is what I'm advocating for, both from a sandboxing (it doesn't need to be allowed to bind() if it doesn't have to, it doesn't even need to have access to the network namespace the bind happens in) and zero downtime restart/configuration update controlled by the service manager.

@flokli So that means you're fine with mixing passing FD and current binding behavior? And since you will use bind explicitly, it's an error to bind to an non existent FD.

Yes, I think the bind syntax should be extended, to allow specifying "use this passed FD rather than binding yourself". Slightly unfortunate name, but well 🤷.

This would also mean, caddy would still bind on its own where we don't explicitly configure it to use the FD(s).

The problem with names is that one name can map to many sockets with different addresses, how do you think caddy handle this situation?

Indeed FileDescriptorName= describes such name applies to all sockets in that .socket file, so sd-listen:http would could identify multiple FDs, not just a single one.

I think I'd be fine landing support for having to explicitly use bind statements everywhere first, working out the syntax for it, and once that's stabilized, I'd think about how a nice out-of-the-box behaviour could look like, if caddy detects it is running in a socket activated scenario.

@WeidiDeng
Copy link
Member

@flokli You can try it with a plugin for now, xcaddy build --with github.com/WeidiDeng/caddy-socket-activation. Let me know what you think.

@caddyserver caddyserver deleted a comment from Karanvarm May 13, 2024
@balki
Copy link

balki commented May 22, 2024

I wrote a small go library to listen on socket activated fds.
https://github.com/balki/anyhttp/blob/main/anyhttp.go#L147

@francislavoie
Copy link
Member

Interesting, this could be turned into a Caddy plugin by using caddy.RegisterNetwork() @balki

@flokli
Copy link
Author

flokli commented May 25, 2024

Interesting, this could be turned into a Caddy plugin by using caddy.RegisterNetwork() @balki

Isn't that exactly what @WeidiDeng's plugin already does? https://github.com/WeidiDeng/caddy-socket-activation/blob/2246ae4a7a00955926ebdf1d557c1530b327bf2f/tcp.go#L12

(I'll try it now, was quite busy before. Did not forget, sorry! Will report back here)

I wrote a small go library to listen on socket activated fds. https://github.com/balki/anyhttp/blob/main/anyhttp.go#L147

There's also github.com/coreos/go-systemd/activation, linked in the initial issue description, which seems a bit more commonly used.

@flokli
Copy link
Author

flokli commented May 25, 2024

I gave https://github.com/WeidiDeng/caddy-socket-activation a try (after WeidiDeng/caddy-socket-activation#1).

Some notes:

I'm using caddy alongside the acme-dns plugin, so don't really need to bind on port 80, however it seems to be quite hard (?) to not bind there (requires disabling autossl) or reconfigure this to another bind, using the new socket-activation functionality (is this even documented at all?).

I ignored the http problem (letting caddy bind on http on its own), and configured a .socket file binding on port 433 tcp (and udp, for http3):

# /etc/systemd/system/caddy.socket
[Unit]

[Socket]
FileDescriptorName=https
ListenDatagram=[::]:443
ListenStream=[::]:443

[Install]
WantedBy=sockets.target

systemd starts up caddy on the first connection, however caddy fails to pick up the FD names, or at least, in some cases. I managed to reproduce the flaky behaviour with systemd-socket-activate too:

[root@n4-rk1:~]# systemd-socket-activate -l '[::]:443' --fdname https /nix/store/fdj05bym30gxrqsn66g1q709d0pirirj-caddy-2.7.6/bin/caddy run --config ./caddy_config --adapter caddyfile
Listening on [::]:443 as 3.
Communication attempt on fd 3.
Execing /nix/store/fdj05bym30gxrqsn66g1q709d0pirirj-caddy-2.7.6/bin/caddy (/nix/store/fdj05bym30gxrqsn66g1q709d0pirirj-caddy-2.7.6/bin/caddy run --config ./caddy_config --adapter caddyfile)
2024/05/25 14:43:57.224	INFO	using provided configuration	{"config_file": "./caddy_config", "config_adapter": "caddyfile"}
2024/05/25 14:43:57.232	WARN	Caddyfile input is not formatted; run 'caddy fmt --overwrite' to fix inconsistencies	{"adapter": "caddyfile", "file": "./caddy_config", "line": 3}
2024/05/25 14:43:57.234	WARN	admin	admin endpoint disabled
2024/05/25 14:43:57.235	INFO	tls.cache.maintenance	started background certificate maintenance	{"cache": "0x400053a180"}
2024/05/25 14:43:57.236	INFO	http.auto_https	server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS	{"server_name": "srv0", "https_port": 443}
2024/05/25 14:43:57.236	INFO	http.auto_https	enabling automatic HTTP->HTTPS redirects	{"server_name": "srv0"}
2024/05/25 14:43:57.238	WARN	tls	unable to get instance ID; storage clean stamps will be incomplete	{"error": "open /root/.local/share/caddy/instance.uuid: no such file or directory"}
2024/05/25 14:43:57.238	INFO	http.log	server running	{"name": "remaining_auto_https_redirects", "protocols": ["h1", "h2", "h3"]}
2024/05/25 14:43:57.238	INFO	tls.cache.maintenance	stopped background certificate maintenance	{"cache": "0x400053a180"}
Error: loading initial config: loading new config: http app module: start: listening on socket-activation/https:443: no file descriptors passed

[root@n4-rk1:~]# systemd-socket-activate -l '[::]:443' --fdname https /nix/store/fdj05bym30gxrqsn66g1q709d0pirirj-caddy-2.7.6/bin/caddy run --config ./caddy_config --adapter caddyfile
Listening on [::]:443 as 3.
Communication attempt on fd 3.
Execing /nix/store/fdj05bym30gxrqsn66g1q709d0pirirj-caddy-2.7.6/bin/caddy (/nix/store/fdj05bym30gxrqsn66g1q709d0pirirj-caddy-2.7.6/bin/caddy run --config ./caddy_config --adapter caddyfile)
2024/05/25 14:44:10.173	INFO	using provided configuration	{"config_file": "./caddy_config", "config_adapter": "caddyfile"}
2024/05/25 14:44:10.176	WARN	Caddyfile input is not formatted; run 'caddy fmt --overwrite' to fix inconsistencies	{"adapter": "caddyfile", "file": "./caddy_config", "line": 3}
2024/05/25 14:44:10.176	WARN	admin	admin endpoint disabled
2024/05/25 14:44:10.177	INFO	tls.cache.maintenance	started background certificate maintenance	{"cache": "0x4000551200"}
2024/05/25 14:44:10.177	INFO	http.auto_https	server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS	{"server_name": "srv0", "https_port": 443}
2024/05/25 14:44:10.179	INFO	http.auto_https	enabling automatic HTTP->HTTPS redirects	{"server_name": "srv0"}
2024/05/25 14:44:10.182	INFO	http	enabling HTTP/3 listener	{"addr": "https:443"}
2024/05/25 14:44:11.945	INFO	[INFO][FileStorage:/root/.local/share/caddy] /root/.local/share/caddy/locks/storage_clean.lock: Empty lockfile (EOF) - likely previous process crashed or storage medium failure; treating as stale
2024/05/25 14:44:11.945	INFO	[INFO][FileStorage:/root/.local/share/caddy] Lock for 'storage_clean' is stale (created: 0001-01-01 00:00:00 +0000 UTC, last update: 0001-01-01 00:00:00 +0000 UTC); removing then retrying: /root/.local/share/caddy/locks/storage_clean.lock
2024/05/25 14:44:11.950	INFO	tls	cleaning storage unit	{"storage": "FileStorage:/root/.local/share/caddy"}
2024/05/25 14:44:11.951	INFO	tls	finished cleaning storage units
2024/05/25 14:44:13.995	INFO	tls.cache.maintenance	stopped background certificate maintenance	{"cache": "0x4000551200"}
Error: loading initial config: loading new config: http app module: start: starting HTTP/3 QUIC listener: listen udp: lookup https: no such host

This might also be related to systemd-socket-activate not allowing to listen on tcp and udp simultaneously, but I got the same messages when run as a systemd service:

May 25 14:27:01 n4-rk1 caddy[253966]: {"level":"warn","ts":1716647221.6430447,"logger":"admin","msg":"admin endpoint disabled"}
May 25 14:27:01 n4-rk1 caddy[253966]: {"level":"info","ts":1716647221.6433222,"logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0x4000393200"}
May 25 14:27:01 n4-rk1 caddy[253966]: {"level":"info","ts":1716647221.643876,"logger":"http.auto_https","msg":"server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS","server_name":"srv0","https_port":443}
May 25 14:27:01 n4-rk1 caddy[253966]: {"level":"info","ts":1716647221.643922,"logger":"http.auto_https","msg":"enabling automatic HTTP->HTTPS redirects","server_name":"srv0"}
May 25 14:27:01 n4-rk1 caddy[253966]: {"level":"info","ts":1716647221.64457,"logger":"http","msg":"enabling HTTP/3 listener","addr":"https:443"}
May 25 14:27:01 n4-rk1 caddy[253966]: {"level":"warn","ts":1716647221.6502545,"logger":"tls","msg":"storage cleaning happened too recently; skipping for now","storage":"FileStorage:/var/lib/caddy/.local/share/caddy","instance":"d733cc11-103e-4a8e-9d6e-24d4a45822da","try_again":1716733621.650252,"try_again_in":86399.999999416}
May 25 14:27:01 n4-rk1 caddy[253966]: {"level":"info","ts":1716647221.6504123,"logger":"tls","msg":"finished cleaning storage units"}
May 25 14:27:05 n4-rk1 caddy[253966]: {"level":"info","ts":1716647225.4954324,"logger":"tls.cache.maintenance","msg":"stopped background certificate maintenance","cache":"0x4000393200"}
May 25 14:27:05 n4-rk1 caddy[253966]: Error: loading initial config: loading new config: http app module: start: starting HTTP/3 QUIC listener: listen udp: lookup https: no such host
May 25 14:27:05 n4-rk1 systemd[1]: caddy.service: Main process exited, code=exited, status=1/FAILURE
May 25 14:27:05 n4-rk1 systemd[1]: caddy.service: Failed with result 'exit-code'.
May 25 14:27:05 n4-rk1 systemd[1]: Failed to start Caddy.
May 25 14:27:05 n4-rk1 systemd[1]: Starting Caddy...
May 25 14:27:05 n4-rk1 caddy[253982]: {"level":"info","ts":1716647225.650479,"msg":"using provided configuration","config_file":"/etc/caddy/caddy_config","config_adapter":"caddyfile"}
May 25 14:27:05 n4-rk1 caddy[253982]: {"level":"warn","ts":1716647225.654428,"msg":"Caddyfile input is not formatted; run 'caddy fmt --overwrite' to fix inconsistencies","adapter":"caddyfile","file":"/etc/caddy/caddy_config","line":3}
May 25 14:27:05 n4-rk1 caddy[253982]: {"level":"warn","ts":1716647225.6552162,"logger":"admin","msg":"admin endpoint disabled"}
May 25 14:27:05 n4-rk1 caddy[253982]: {"level":"info","ts":1716647225.6555943,"logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0x4000494880"}
May 25 14:27:05 n4-rk1 caddy[253982]: {"level":"info","ts":1716647225.6560173,"logger":"http.auto_https","msg":"server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS","server_name":"srv0","https_port":443}
May 25 14:27:05 n4-rk1 caddy[253982]: {"level":"info","ts":1716647225.6560597,"logger":"http.auto_https","msg":"enabling automatic HTTP->HTTPS redirects","server_name":"srv0"}
May 25 14:27:05 n4-rk1 caddy[253982]: {"level":"info","ts":1716647225.6566808,"logger":"http.log","msg":"server running","name":"remaining_auto_https_redirects","protocols":["h1","h2","h3"]}
May 25 14:27:05 n4-rk1 caddy[253982]: {"level":"info","ts":1716647225.6567864,"logger":"tls.cache.maintenance","msg":"stopped background certificate maintenance","cache":"0x4000494880"}
May 25 14:27:05 n4-rk1 caddy[253982]: Error: loading initial config: loading new config: http app module: start: listening on socket-activation/https:443: no file descriptors passed
May 25 14:27:05 n4-rk1 systemd[1]: caddy.service: Main process exited, code=exited, status=1/FAILURE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion 💬 The right solution needs to be found feature ⚙️ New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants