Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lnd should exit with no-success error code when automatically shut down due to bitcoind problems #5625

Open
Talkless opened this issue Aug 12, 2021 · 6 comments · May be fixed by #8659
Open
Labels
beginner Issues suitable for new developers bitcoind Bitcoin Core backend error codes good first issue Issues suitable for first time contributors to LND linux P3 might get fixed, nice to have

Comments

@Talkless
Copy link

Background

While troubleshooting tor & bitcoind issues, I've restarted bitcoind two times in a row, and discovered (just accidentally, as I have bitcoind and lnd logs tailed in the same tmux split screen) that lnd is shutting itself down:

Aug 12 12:55:25 odroid-hc1 lnd[5606]: 2021-08-12 12:55:25.120 [ERR] NTFN: Unable to fetch block header: Post "http://127.0.0.1:8332": dial tcp 127.0.0.1:8332: connect: connection refused
Aug 12 12:55:25 odroid-hc1 lnd[5606]: 2021-08-12 12:55:25.304 [INF] CRTR: Pruning channel graph using block 0000000000000000000c7620d7807edfad1c475f204f5c17da4601e2a6e13945 (height=695396)
Aug 12 12:55:37 odroid-hc1 lnd[5606]: 2021-08-12 12:55:37.500 [INF] DISC: GossipSyncer(02ad6fb8d693dc1e4569bcedefadf5f72a931ae027dc0f0c544b34c1c6f3b9a02b): applying gossipFilter(start=0001-01-01 00:00:00 +0000 UTC, end=0001-01-01 00:00:00 +0000 UTC)
Aug 12 12:55:37 odroid-hc1 lnd[5606]: 2021-08-12 12:55:37.501 [INF] DISC: GossipSyncer(036d2ac71176151db04fdac839a0ddea9f3a584f6c23bb0b4ac72c323124ec506b): applying gossipFilter(start=2021-08-12 12:55:37.501421413 +0300 EEST m=+1873248.185437149, end=2157-09-18 19:23:52.501421413 +0300 EEST)
Aug 12 12:56:21 odroid-hc1 lnd[5606]: 2021-08-12 12:56:21.460 [INF] HLCK: Health check: chain backend, call: 2 failed with: -28: Verifying blocks..., backing off for: 2m0s
Aug 12 12:56:37 odroid-hc1 lnd[5606]: 2021-08-12 12:56:37.501 [INF] DISC: Broadcasting 49 new announcements in 5 sub batches
Aug 12 12:58:21 odroid-hc1 lnd[5606]: 2021-08-12 12:58:21.476 [CRT] SRVR: Health check: chain backend failed after 3 calls
Aug 12 12:58:21 odroid-hc1 lnd[5606]: 2021-08-12 12:58:21.476 [INF] SRVR: Sending request for shutdown
Aug 12 12:58:21 odroid-hc1 lnd[5606]: 2021-08-12 12:58:21.516 [INF] LTND: Received shutdown request.
Aug 12 12:58:21 odroid-hc1 lnd[5606]: 2021-08-12 12:58:21.517 [INF] LTND: Shutting down...
Aug 12 12:58:21 odroid-hc1 lnd[5606]: 2021-08-12 12:58:21.517 [INF] LTND: Gracefully shutting down.
...
Aug 12 12:59:58 odroid-hc1 lnd[5606]: 2021-08-12 12:59:58.385 [INF] LTND: Shutdown complete
Aug 12 12:59:58 odroid-hc1 systemd[1]: lnd.service: Succeeded.

systemctl status now is:

   Loaded: loaded (/etc/systemd/system/lnd.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Thu 2021-08-12 12:59:58 EEST; 9min ago
  Process: 5587 ExecStartPre=/usr/local/bin/lndcache.sh (code=exited, status=0/SUCCESS)
  Process: 5606 ExecStart=/home/lnd/lnd.go/bin/lnd --profile=9000 (code=exited, status=0/SUCCESS)
 Main PID: 5606 (code=exited, status=0/SUCCESS)

So it has SUCCESS status, meaning systemd will not restart lnd in case of this bail out, even if I have Restart=on-failure, as this was not reported as failure. This is risk for "silently" losing lightning functionality...

Your environment

  • v0.13.1-beta
  • Linux odroid-hc1 4.19.0-17-armmp-lpae #1 SMP Debian 4.19.194-3 (2021-07-18) armv7l GNU/Linux
  • v0.21.1

Steps to reproduce

Keep restarting bitcoind until lnd shuts down with success result

Expected behaviour

lnd process should exit with non-zero result.

Actual behaviour

lnd exits with success result.

@Roasbeef Roasbeef added bitcoind Bitcoin Core backend error codes beginner Issues suitable for new developers linux labels Aug 12, 2021
@DarthBenro008
Copy link

Hey! I would like to work on this!

@Roasbeef Roasbeef added the P3 might get fixed, nice to have label Aug 31, 2021
@Talkless
Copy link
Author

Got another lnd auto-shutdown, this time I did nothing, i.e. bitcoind is still running:

Sep 20 13:34:12 odroid-hc1 lnd[3214]: 2021-09-20 13:34:12.812 [INF] HLCK: Health check: chain backend, call: 2 failed with: health check: chain backend timed
 out after: 30s, backing off for: 2m0s
...
Sep 20 13:36:42 odroid-hc1 lnd[3214]: 2021-09-20 13:36:42.859 [CRT] SRVR: Health check: chain backend failed after 3 calls

@sangaman
Copy link
Contributor

sangaman commented Oct 5, 2021

This same thing happened to me recently, shutdown due to healthcheck chain backend failed after 3 calls. I'm also using systemd to manage lnd and my node was down for some time without my knowledge. My wish would probably be that lnd doesn't shut down in case the backend is lagging - rather it goes in an idle state and waits for the backend to come back online. However, having it shutdown with an error in case of healthcheck failure and using Restart=on-failure would be good enough for my needs. So +1 to this feature request.

@Talkless
Copy link
Author

This issue repeats about 2-3 times per month:
image

@Talkless
Copy link
Author

Oh, I see I can configure health checks:

[healthcheck]

Still, this issue stands, process should exit with non-successful code.

@Talkless
Copy link
Author

Any progress? lnd "died" while I was not at home, probably due to bitcoind being loaded too much with huge mempool we have recently:

May 09 23:12:03 odroid-hc1 lnd[2542]: 2023-05-09 23:12:03.347 [CRT] SRVR: Health check: chain backend failed after 10 calls
May 09 23:12:03 odroid-hc1 lnd[2542]: 2023-05-09 23:12:03.356 [INF] SRVR: Sending request for shutdown

@ziggie1984 ziggie1984 added the good first issue Issues suitable for first time contributors to LND label Apr 15, 2024
@mohamedawnallah mohamedawnallah linked a pull request Apr 17, 2024 that will close this issue
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beginner Issues suitable for new developers bitcoind Bitcoin Core backend error codes good first issue Issues suitable for first time contributors to LND linux P3 might get fixed, nice to have
Projects
None yet
5 participants