Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#8796: Worker not consuming tasks after Redis broker restart, maybe not fixed in version 5.4.0 #8990

Open
zjqzero opened this issue Apr 29, 2024 · 6 comments
Assignees
Milestone

Comments

@zjqzero
Copy link

zjqzero commented Apr 29, 2024

[worker-0.log] after restart redis

[2024-04-29 15:39:32,181: WARNING/MainProcess] /usr/local/lib/python3.12/site-packages/celery/worker/consumer/consumer.py:508: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
warnings.warn(

[2024-04-29 15:39:32,183: ERROR/MainProcess] consumer: Cannot connect to redis://192.168.6.21:6379/0: Error 111 connecting to 192.168.6.21:6379. Connection refused..
Trying again in 2.00 seconds... (1/100)

[2024-04-29 15:39:34,190: INFO/MainProcess] Connected to redis://192.168.6.21:6379/0
[2024-04-29 15:39:34,191: WARNING/MainProcess] /usr/local/lib/python3.12/site-packages/celery/worker/consumer/consumer.py:508: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
warnings.warn(

[2024-04-29 15:39:34,196: INFO/MainProcess] mingle: searching for neighbors
[2024-04-29 15:39:35,205: INFO/MainProcess] mingle: all alone
[2024-04-29 15:39:40,218: INFO/MainProcess] missed heartbeat from 55c7a36024c3

[version]

python 3.12 + redis 7.2.4 + celery 5.4.0

`
Package Version


amqp 5.2.0
APScheduler 3.10.4
beautifulsoup4 4.12.3
bidict 0.23.1
billiard 4.2.0
blinker 1.8.1
bs4 0.0.2
cachelib 0.9.0
celery 5.4.0
certifi 2024.2.2
charset-normalizer 3.3.2
click 8.1.7
click-didyoumean 0.3.1
click-plugins 1.1.1
click-repl 0.3.0
dnspython 2.6.1
et-xmlfile 1.1.0
Flask 3.0.3
Flask-APScheduler 1.13.1
Flask-Caching 2.2.0
Flask-Cors 4.0.0
h11 0.14.0
hiredis 2.3.2
idna 3.7
itsdangerous 2.2.0
Jinja2 3.1.3
kombu 5.3.7
lml 0.1.0
Logbook 1.7.0.post0
loguru 0.7.2
lxml 5.2.1
MarkupSafe 2.1.5
marshmallow 3.21.1
mongoengine 0.28.2
numpy 1.26.4
openpyxl 3.1.2
packaging 24.0
pandas 2.2.2
pip 24.0
prompt-toolkit 3.0.43
pyexcel-io 0.6.6
pyexcel-xlsx 0.6.0
pymongo 4.7.0
python-dateutil 2.9.0.post0
python-engineio 4.9.0
python-redis-lock 4.0.0
python-socketio 5.11.2
pytz 2024.1
raven 6.10.0
redis 5.0.4
requests 2.31.0
setuptools 69.2.0
simple-websocket 1.0.0
six 1.16.0
soupsieve 2.5
supervisor 4.2.5
tenacity 8.2.3
thulac 0.2.2
tzdata 2024.1
tzlocal 5.2
urllib3 2.2.1
uWSGI 2.0.25.1
vine 5.1.0
wcwidth 0.2.13
Werkzeug 3.0.2
wheel 0.43.0
wsproto 1.2.0
`

Therefor I have to revert to celery 5.2.2 .

@zjqzero
Copy link
Author

zjqzero commented May 7, 2024

I have tested it many times on Celery 5.2.2 + Kombu 5.2.2 without any problems.
The production environment also uses this version.

However, upgrading Kombu to a version higher than 5.2.2, such as 5.3.7, frequently introduces issues.

I have reproduced this scenario manually. Here is some logs.

Based on my observations, Kombu seems to be the most likely source of this issue.

`
Package Version


...
celery 5.2.2
...
kombu 5.3.7
...

user@gamma:~$ celery -A test.app worker -l INFO

-------------- celery@gamma v5.2.2 (dawn-chorus)
--- ***** -----
-- ******* ---- Linux-5.15.0-101-generic-x86_64-with-glibc2.35 2024-05-07 10:34:32

  • *** --- * ---
  • ** ---------- [config]
  • ** ---------- .> app: hello:0x7f5caf7f23e0
  • ** ---------- .> transport: redis://192.168.6.21:6379/0
  • ** ---------- .> results: disabled://
  • *** --- * --- .> concurrency: 20 (prefork)
    -- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
    --- ***** -----
    -------------- [queues]
    .> celery exchange=celery(direct) key=celery

[tasks]
. test.hello

[2024-05-07 10:34:32,591: INFO/MainProcess] Connected to redis://192.168.6.21:6379/0
[2024-05-07 10:34:32,597: INFO/MainProcess] mingle: searching for neighbors
[2024-05-07 10:34:33,609: INFO/MainProcess] mingle: sync with 2 nodes
[2024-05-07 10:34:33,609: INFO/MainProcess] mingle: sync complete
[2024-05-07 10:34:33,623: INFO/MainProcess] celery@gamma ready.
[2024-05-07 10:34:51,097: INFO/MainProcess] Task test.hello[8ead87e5-2c63-4449-86e7-a19f4a168dc4] received
[2024-05-07 10:34:51,099: INFO/ForkPoolWorker-16] Task test.hello[8ead87e5-2c63-4449-86e7-a19f4a168dc4] succeeded in 0.00034789182245731354s: 'hello world'

run celery worker and have a test, it works then restart redis

[2024-05-07 10:35:25,473: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
...
redis.exceptions.ConnectionError: Connection closed by server.
...

[2024-05-07 10:35:25,480: ERROR/MainProcess] consumer: Cannot connect to redis://192.168.6.21:6379/0: Error 111 connecting to 192.168.6.21:6379. Connection refused..
Trying again in 2.00 seconds... (1/100)

[2024-05-07 10:35:27,488: INFO/MainProcess] Connected to redis://192.168.6.21:6379/0
[2024-05-07 10:35:27,493: INFO/MainProcess] mingle: searching for neighbors
[2024-05-07 10:35:28,498: INFO/MainProcess] mingle: all alone

[2024-05-07 10:35:38,419: INFO/MainProcess] Task test.hello[9739f7cd-5a4d-4cfa-9473-ebe131a10bce] received
[2024-05-07 10:35:38,420: INFO/ForkPoolWorker-16] Task test.hello[9739f7cd-5a4d-4cfa-9473-ebe131a10bce] succeeded in 8.22567380964756e-05s: 'hello world'

test again, it works then restart redis again

[2024-05-07 10:35:54,940: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
...

[2024-05-07 10:35:54,946: ERROR/MainProcess] consumer: Cannot connect to redis://192.168.6.21:6379/0: Error 111 connecting to 192.168.6.21:6379. Connection refused..
Trying again in 2.00 seconds... (1/100)

[2024-05-07 10:35:56,952: INFO/MainProcess] Connected to redis://192.168.6.21:6379/0
[2024-05-07 10:35:56,957: INFO/MainProcess] mingle: searching for neighbors
[2024-05-07 10:35:57,963: INFO/MainProcess] mingle: all alone

[2024-05-07 10:36:04,115: INFO/MainProcess] Task test.hello[ea4fd100-6d29-47c3-82b0-41e03a635781] received
[2024-05-07 10:36:04,117: INFO/ForkPoolWorker-16] Task test.hello[ea4fd100-6d29-47c3-82b0-41e03a635781] succeeded in 8.298782631754875e-05s: 'hello world'

test again, it works then restart redis again

[2024-05-07 10:36:11,417: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):

...

[2024-05-07 10:36:11,422: ERROR/MainProcess] consumer: Cannot connect to redis://192.168.6.21:6379/0: Error 111 connecting to 192.168.6.21:6379. Connection refused..
Trying again in 2.00 seconds... (1/100)

[2024-05-07 10:36:13,431: INFO/MainProcess] Connected to redis://192.168.6.21:6379/0
[2024-05-07 10:36:13,436: INFO/MainProcess] mingle: searching for neighbors
[2024-05-07 10:36:14,445: INFO/MainProcess] mingle: all alone
[2024-05-07 10:36:19,456: INFO/MainProcess] missed heartbeat from 9d80da8705ff
[2024-05-07 10:36:19,456: INFO/MainProcess] missed heartbeat from 59dbc68b852c

at this time it doesn't work.

`

@Nusnus

@zjqzero
Copy link
Author

zjqzero commented May 7, 2024

I have made some tests
celery 5.4.0 + kombu 5.2.2 passed
celery 5.3.6 + kombu 5.2.2 passed
celery 5.3.6 + kombu 5.2.3 failed
celery 5.2.2 + kombu 5.2.3 failed

I hope this is helpful! @Nusnus

@Nusnus
Copy link
Member

Nusnus commented May 7, 2024

celery 5.4.0 + kombu 5.2.2 passed

Did this mix have Redis working after multiple restarts?? 🙏 👀
@zjqzero

@Nusnus Nusnus self-assigned this May 7, 2024
@Nusnus Nusnus added this to the 5.4.x milestone May 7, 2024
@zjqzero
Copy link
Author

zjqzero commented May 7, 2024

celery 5.4.0 + kombu 5.2.2 passed

Did this mix have Redis working after multiple restarts?? 🙏 👀 @zjqzero

Yes, it works. I've tried it manually several times, although it was not automated, I'm not confident that I was able to reproduce the bug.

@GyR4uk
Copy link

GyR4uk commented May 22, 2024

Hi, I also see this problem on my instances, after updating celery 5.4.0 in conjunction with redis.

@zjqzero after restarting redis, celery stops calling tasks

@GyR4uk
Copy link

GyR4uk commented May 24, 2024

celery_1       | [2024-05-24 07:09:55,113: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
celery_1       | Traceback (most recent call last):
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/celery/worker/consumer/consumer.py", line 340, in start
celery_1       |     blueprint.start(self)
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/celery/bootsteps.py", line 116, in start
celery_1       |     step.start(parent)
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/celery/worker/consumer/consumer.py", line 746, in start
celery_1       |     c.loop(*c.loop_args())
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/celery/worker/loops.py", line 97, in asynloop
celery_1       |     next(loop)
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/kombu/asynchronous/hub.py", line 373, in create_loop
celery_1       |     cb(*cbargs)
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/kombu/transport/redis.py", line 1344, in on_readable
celery_1       |     self.cycle.on_readable(fileno)
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/kombu/transport/redis.py", line 569, in on_readable
celery_1       |     chan.handlers[type]()
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/kombu/transport/redis.py", line 913, in _receive
celery_1       |     ret.append(self._receive_one(c))
celery_1       |                ^^^^^^^^^^^^^^^^^^^^
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/kombu/transport/redis.py", line 923, in _receive_one
celery_1       |     response = c.parse_response()
celery_1       |                ^^^^^^^^^^^^^^^^^^
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/redis/client.py", line 837, in parse_response
celery_1       |     response = self._execute(conn, try_read)
celery_1       |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/redis/client.py", line 813, in _execute
celery_1       |     return conn.retry.call_with_retry(
celery_1       |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/redis/retry.py", line 49, in call_with_retry
celery_1       |     fail(error)
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/redis/client.py", line 815, in <lambda>
celery_1       |     lambda error: self._disconnect_raise_connect(conn, error),
celery_1       |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/redis/client.py", line 802, in _disconnect_raise_connect
celery_1       |     raise error
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/redis/retry.py", line 46, in call_with_retry
celery_1       |     return do()
celery_1       |            ^^^^
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/redis/client.py", line 814, in <lambda>
celery_1       |     lambda: command(*args, **kwargs),
celery_1       |             ^^^^^^^^^^^^^^^^^^^^^^^^
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/redis/client.py", line 835, in try_read
celery_1       |     return conn.read_response(disconnect_on_error=False, push_request=True)
celery_1       |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/redis/connection.py", line 512, in read_response
celery_1       |     response = self._parser.read_response(disable_decoding=disable_decoding)
celery_1       |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/redis/_parsers/resp2.py", line 15, in read_response
celery_1       |     result = self._read_response(disable_decoding=disable_decoding)
celery_1       |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/redis/_parsers/resp2.py", line 25, in _read_response
celery_1       |     raw = self._buffer.readline()
celery_1       |           ^^^^^^^^^^^^^^^^^^^^^^^
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/redis/_parsers/socket.py", line 115, in readline
celery_1       |     self._read_from_socket()
celery_1       |   File "/opt/pysetup/.venv/lib/python3.11/site-packages/redis/_parsers/socket.py", line 68, in _read_from_socket
celery_1       |     raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
celery_1       | redis.exceptions.ConnectionError: Connection closed by server.
celery_1       | [2024-05-24 07:09:55,118: WARNING/MainProcess] /opt/pysetup/.venv/lib/python3.11/site-packages/celery/worker/consumer/consumer.py:391: CPendingDeprecationWarning:
celery_1       | In Celery 5.1 we introduced an optional breaking change which
celery_1       | on connection loss cancels all currently executed tasks with late acknowledgement enabled.
celery_1       | These tasks cannot be acknowledged as the connection is gone, and the tasks are automatically redelivered
celery_1       | back to the queue. You can enable this behavior using the worker_cancel_long_running_tasks_on_connection_loss
celery_1       | setting. In Celery 5.1 it is set to False by default. The setting will be set to True by default in Celery 6.0.
celery_1       |
celery_1       |   warnings.warn(CANCEL_TASKS_BY_DEFAULT, CPendingDeprecationWarning)
celery_1       |
celery_1       | [2024-05-24 07:09:55,118: INFO/MainProcess] Temporarily reducing the prefetch count to 40 to avoid over-fetching since 2 tasks are currently being processed.
celery_1       | The prefetch count will be gradually restored to 48 as the tasks complete processing.
celery_1       | [2024-05-24 07:09:55,121: WARNING/MainProcess] /opt/pysetup/.venv/lib/python3.11/site-packages/celery/worker/consumer/consumer.py:508: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
celery_1       | whether broker connection retries are made during startup in Celery 6.0 and above.
celery_1       | If you wish to retain the existing behavior for retrying connections on startup,
celery_1       | you should set broker_connection_retry_on_startup to True.
celery_1       |   warnings.warn(
celery_1       |
celery_1       | [2024-05-24 07:09:55,122: ERROR/MainProcess] consumer: Cannot connect to redis://10.11.0.92:6379/0: Error while reading from 10.11.0.92:6379 : (104, 'Connection reset by peer').
celery_1       | Trying again in 2.00 seconds... (1/100)
celery_1       |
celery_1       | [2024-05-24 07:09:57,145: INFO/MainProcess] Connected to redis://10.11.0.92:6379/0
celery_1       | [2024-05-24 07:09:57,146: WARNING/MainProcess] /opt/pysetup/.venv/lib/python3.11/site-packages/celery/worker/consumer/consumer.py:508: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
celery_1       | whether broker connection retries are made during startup in Celery 6.0 and above.
celery_1       | If you wish to retain the existing behavior for retrying connections on startup,
celery_1       | you should set broker_connection_retry_on_startup to True.
celery_1       |   warnings.warn(
celery_1       |
celery_1       | [2024-05-24 07:09:57,150: INFO/MainProcess] mingle: searching for neighbors
celery_1       | [2024-05-24 07:09:58,155: INFO/MainProcess] mingle: all alone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants