-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate shutdown handling from unhandled exceptions in main bot task #5780
Comments
It has been a while since this issue has been posted, however, it seems to be exactly what happened twice to my bot.
As soon as the bot loaded, it threw an exception about being unable to connect to Discord. HOWEVER, as you can see from timestamps, despite saying "Shutting down", it does not, it's still running, due to which systemd sees it as still running, not restarting it automatically. It wasn't until a few days later that one of the mods noticed the bot is down that it was manually restarted (hence the 3 messages at the end of the log).
Redbot version 3.4.18 |
I have had a similar situation as PewnyPL, usually when there is a power outage the device running the bot comes up faster than the router so there is no connection when the service that starts the bot runs. [2023-07-16 09:17:14] [INFO] discord.client: logging in using static token
[2023-07-16 09:17:14] [CRITICAL] red.main: The main bot task didn't handle an exception and has crashed
Traceback (most recent call last):
File "/..[venv path]../lib/python3.9/site-packages/aiohttp/connector.py", line 1152, in _create_direct_connection
hosts = await asyncio.shield(host_resolved)
File "/..[venv path]../lib/python3.9/site-packages/aiohttp/connector.py", line 874, in _resolve_host
addrs = await self._resolver.resolve(host, port, family=self._family)
File "/..[venv path]../lib/python3.9/site-packages/aiohttp/resolver.py", line 33, in resolve
infos = await self._loop.getaddrinfo(
File "uvloop/loop.pyx", line 1528, in getaddrinfo
socket.gaierror: [Errno -2] Name or service not known
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/..[venv path]../lib/python3.9/site-packages/redbot/__main__.py", line 469, in red_exception_handler
red_task.result()
File "/..[venv path]../lib/python3.9/site-packages/redbot/__main__.py", line 369, in run_bot
await red.start(token)
File "/..[venv path]../lib/python3.9/site-packages/redbot/core/bot.py", line 1270, in start
await self.login(token)
File "/..[venv path]../lib/python3.9/site-packages/discord/client.py", line 612, in login
data = await self.http.static_login(token)
File "/..[venv path]../lib/python3.9/site-packages/discord/http.py", line 801, in static_login
data = await self.request(Route('GET', '/users/@me'))
File "/..[venv path]../lib/python3.9/site-packages/discord/http.py", line 624, in request
async with self.__session.request(method, url, **kwargs) as response:
File "/..[venv path]../lib/python3.9/site-packages/aiohttp/client.py", line 1141, in __aenter__
self._resp = await self._coro
File "/..[venv path]../lib/python3.9/site-packages/aiohttp/client.py", line 536, in _request
conn = await self._connector.connect(
File "/..[venv path]../lib/python3.9/site-packages/aiohttp/connector.py", line 540, in connect
proto = await self._create_connection(req, traces, timeout)
File "/..[venv path]../lib/python3.9/site-packages/aiohttp/connector.py", line 901, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
File "/..[venv path]../lib/python3.9/site-packages/aiohttp/connector.py", line 1166, in _create_direct_connection
raise ClientConnectorError(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host discord.com:443 ssl:default [Name or service not known]
[2023-07-16 09:17:14] [WARNING] red.main: Attempting to die as gracefully as possible...
[2023-07-16 09:17:14] [INFO] red.main: Shutting down from unhandled exception
[2023-07-16 10:44:42] [INFO] red.main: SIGTERM received. Quitting...
[2023-07-16 10:44:42] [INFO] red.main: Shutting down with exit code: 0 (SHUTDOWN)
[2023-07-16 10:44:42] [INFO] red.main: Please wait, cleaning up a bit more Also note that the process actually terminate around 1h 30m later, despite saying "Shutting down...". By the way, this is now a request depending on how this will be dealt with: I have set the service to not restart if the Exit code is 0 (just then it doesn't restart if I manually shut it down through [p]shutdown), however, as seen above the Exit code is 0 for this issue then preventing the service from restarting despite the crash. It could be helpful if Exit code wasn't 0 for these cases. Red v3.5.2 by the way. |
Hi, I'd like to report that this is plaguing me too. I can also verify by manually blocking discord.com in hosts that, this issue happens even without using systemd, so systemd is not the core issue here. I'd like to add I am using the postgres version of RedBot. Perhaps all the users above are also using Postgres? This should be a common issue judging by its presentation yet only 3 reporting has been done so far. Tried but no effect:
Edit 1: I tried running it with a --no-cogs option, and it seems doesn't seem to have any effect, Edit 2: Running the code with profiler, it seems that the redbot is stuck in the run_forever loop and any attempts to stop the code must have failed. |
I found a workaround for this by adding |
This message is unrelated to this bug. This bug is about the bot shutdown process, not the bot starting up before |
The workaround I posted is 100% related to this bug. The bug is that the bot hangs instead of shutting down if discord.com is unreachable. Look at the last few lines of the traceback ajwset posted
|
I agree that the shown scenario is one of the cases that can produce this bug. It is possible that there will need to be some additional care taken since this specific case occurs for shutdowns during startup but I would definitely to properly handle shutdowns happening during startup. Note that a fix for this issue is unlikely to involve a reconnection strategy which I'm guessing is why Yami said it doesn't relate to this bug - the bot would just actually shutdown (with a proper exit code) rather than hang indefinitely. As far as workarounds go, this one does help with some of the cases listed in the comments on this issue, it just won't help you resolve the underlying issue of the bot hanging on shutdown if it happens for any other reason than discord.com being unreachable while the bot starts up. Based on this issue, there are some people in the community struggling with it so I'm sure some will find this useful :) |
I had a little poke around debugging with visual studio code I found this is the last line (of the bots code) that gets executed before the hang: https://github.com/Cog-Creators/Red-DiscordBot/blob/7dfe24397ee40114f164c919e77a19c1be45cb5b/redbot/__main__.py#L445C9-L445C9 |
Yes, while indeed it doesn't fix the bug it is related and worked as a good workaround for the issue (at least for what I was facing). Delaying the start up while domain name can't be solved helped to prevent the crash and hang at least. |
After some more poking around I found that if an exception isn't caught inside run_bot() and sys.exit() isn't called, the bot will hang inside the exception handler and shutdown handler.
Edit: I stopped the exception handler from hanging in the first place instead. |
It would seem that Red may infinitely hang during the handling of such exceptions.
An example:
The text was updated successfully, but these errors were encountered: