Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: Celery worker not working #127

Closed
jraddaoui opened this issue Mar 13, 2019 · 15 comments
Closed

Problem: Celery worker not working #127

jraddaoui opened this issue Mar 13, 2019 · 15 comments
Assignees
Projects
Milestone

Comments

@jraddaoui
Copy link
Collaborator

jraddaoui commented Mar 13, 2019

In recent installations the Celery worker crashes with the following error:

[2019-03-13 12:25:36,423: CRITICAL/MainProcess] Unrecoverable error: VersionMismatch('Redis transport requires redis-py versions 3.2.0 or later. You have 2.10.6',)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/kombu/transport/virtual/base.py", line 921, in create_channel
    return self._avail_channels.pop()
IndexError: pop from empty list
 
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/celery/worker/worker.py", line 205, in start
    self.blueprint.start(self)
  File "/usr/local/lib/python3.6/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "/usr/local/lib/python3.6/site-packages/celery/bootsteps.py", line 369, in start
    return self.obj.start()
  File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 317, in start
    blueprint.start(self)
  File "/usr/local/lib/python3.6/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/connection.py", line 23, in start
    c.connection = c.connect()
  File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 404, in connect
    conn = self.connection_for_read(heartbeat=self.amqheartbeat)
  File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 411, in connection_for_read
    self.app.connection_for_read(heartbeat=heartbeat))
  File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 435, in ensure_connected
    callback=maybe_shutdown,
  File "/usr/local/lib/python3.6/site-packages/kombu/connection.py", line 422, in ensure_connection
    callback, timeout=timeout)
  File "/usr/local/lib/python3.6/site-packages/kombu/utils/functional.py", line 341, in retry_over_time
    return fun(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/kombu/connection.py", line 275, in connect
    return self.connection
  File "/usr/local/lib/python3.6/site-packages/kombu/connection.py", line 823, in connection
    self._connection = self._establish_connection()
  File "/usr/local/lib/python3.6/site-packages/kombu/connection.py", line 778, in _establish_connection
    conn = self.transport.establish_connection()
  File "/usr/local/lib/python3.6/site-packages/kombu/transport/virtual/base.py", line 941, in establish_connection
    self._avail_channels.append(self.create_channel(self))
  File "/usr/local/lib/python3.6/site-packages/kombu/transport/virtual/base.py", line 923, in create_channel
    channel = self.Channel(connection)
  File "/usr/local/lib/python3.6/site-packages/kombu/transport/redis.py", line 495, in __init__
    self.Client = self._get_client()
  File "/usr/local/lib/python3.6/site-packages/kombu/transport/redis.py", line 956, in _get_client
    'You have {0.__version__}'.format(redis))
kombu.exceptions.VersionMismatch: Redis transport requires redis-py versions 3.2.0 or later. You have 2.10.6

More info:

Proposed solution is to pin kombu==4.3.0 until Celery 4.3 gets released.

@jraddaoui jraddaoui self-assigned this Mar 13, 2019
@cole
Copy link
Contributor

cole commented Mar 13, 2019

Ugh, that is annoying. Too bad they're not using semver properly. Proposed solution sounds good, I think it's just important to remember to update kombu when you update celery in future.

@jraddaoui
Copy link
Collaborator Author

Thanks @cole. Unfortunately, pinning kombu didn't solve the issue. We also have redis-py pinned to 2.10.6 and it seems to be some important fixes in 3.2.0. How would you feel about upgrading redis-py and use celery==4.3.0rc2 for the time being?

@jraddaoui
Copy link
Collaborator Author

Just tried with that and it looks like I'm having some issues building the Docker container as I'm getting the same issue. I'll try with the kombu pinning again to see if I'm able to properly re-build the container with the new deps.

@cole
Copy link
Contributor

cole commented Mar 13, 2019

@jraddaoui I haven't tested the kombu pinning — just saw that it is working for others in the linked issues and it sounds ok in principle. Maybe try pinning kombu and updating redis? It seems that 4.3 requires redis 3.2+

@jraddaoui
Copy link
Collaborator Author

Okay, I got it working with both setups. I'll pin kombu until celery 4.3 is released. As @cole mentions we'll need to upgrade or remove the kombu dependency when we upgrade celery and also upgrade redis to 3.2.0 or higher.

@jraddaoui
Copy link
Collaborator Author

@jraddaoui I haven't tested the kombu pinning — just saw that it is working for others in the linked issues and it sounds ok in principle. Maybe try pinning kombu and updating redis? It seems that 4.3 requires redis 3.2+

Sorry @cole, I missed that update.

kombu 4.4 requires redis 3.2, but they don't work well with celery 4.2 and celery 4.3 is still a RC. Pinning kombu to 4.3 works in the end (it was a Docker issue on my end) and I think it's the best option until celery 4.3 is released.

@cole
Copy link
Contributor

cole commented Mar 13, 2019

Sounds good!

@jraddaoui
Copy link
Collaborator Author

Thanks @cole!

@sallain sallain closed this as completed Mar 14, 2019
@sallain sallain reopened this Mar 14, 2019
@sallain sallain added the review label Mar 14, 2019
@sallain
Copy link
Collaborator

sallain commented Mar 14, 2019

@stefanabreitwieser If DIP upload and descendant updates work, this can be closed.

@jraddaoui
Copy link
Collaborator Author

Sorry, the application has not been updated in the test server, I'll mark this for review once the search filters pull request is merged and deployed, which will also include the fix for this issue.

@jraddaoui
Copy link
Collaborator Author

This is now ready for review in the test instance.

@jraddaoui
Copy link
Collaborator Author

jraddaoui commented Mar 20, 2019

The worker crashed again with the following error:

[2019-03-20 05:51:45,476: CRITICAL/MainProcess] Unrecoverable error: ResponseError('MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.',)
Traceback (most recent call last):
  File "/home/scope/dip-access-interface/venv/lib/python3.5/site-packages/celery/worker/worker.py", line 205, in start
    self.blueprint.start(self)
  File "/home/scope/dip-access-interface/venv/lib/python3.5/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "/home/scope/dip-access-interface/venv/lib/python3.5/site-packages/celery/bootsteps.py", line 369, in start
    return self.obj.start()
  File "/home/scope/dip-access-interface/venv/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 317, in start
    blueprint.start(self)
  File "/home/scope/dip-access-interface/venv/lib/python3.5/site-packages/celery/bootsteps.py", line 119, in start
    step.start(parent)
  File "/home/scope/dip-access-interface/venv/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 593, in start
    c.loop(*c.loop_args())
  File "/home/scope/dip-access-interface/venv/lib/python3.5/site-packages/celery/worker/loops.py", line 91, in asynloop
    next(loop)
  File "/home/scope/dip-access-interface/venv/lib/python3.5/site-packages/kombu/asynchronous/hub.py", line 362, in create_loop
    cb(*cbargs)
  File "/home/scope/dip-access-interface/venv/lib/python3.5/site-packages/kombu/transport/redis.py", line 1065, in on_readable
    self.cycle.on_readable(fileno)
  File "/home/scope/dip-access-interface/venv/lib/python3.5/site-packages/kombu/transport/redis.py", line 348, in on_readable
    chan.handlers[type]()
  File "/home/scope/dip-access-interface/venv/lib/python3.5/site-packages/kombu/transport/redis.py", line 725, in _brpop_read
    **options)
  File "/home/scope/dip-access-interface/venv/lib/python3.5/site-packages/redis/client.py", line 680, in parse_response
    response = connection.read_response()
  File "/home/scope/dip-access-interface/venv/lib/python3.5/site-packages/redis/connection.py", line 629, in read_response
    raise response
redis.exceptions.ResponseError: MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error.

From the Redis service logs:

Mar 20 09:36:41 scope redis-server[1425]: 1425:M 20 Mar 2019 09:36:41.054 * 1 changes in 900 seconds. Saving...
Mar 20 09:36:41 scope redis-server[1425]: 1425:M 20 Mar 2019 09:36:41.056 * Background saving started by pid 5508
Mar 20 09:36:41 scope redis-server[1425]: 5508:C 20 Mar 2019 09:36:41.064 * DB saved on disk
Mar 20 09:36:41 scope redis-server[1425]: 5508:C 20 Mar 2019 09:36:41.064 * RDB: 0 MB of memory used by copy-on-write
Mar 20 09:36:41 scope redis-server[1425]: 1425:M 20 Mar 2019 09:36:41.157 * Background saving terminated with success
Mar 20 09:51:42 scope redis-server[1425]: 1425:M 20 Mar 2019 09:51:42.158 * 1 changes in 900 seconds. Saving...
Mar 20 09:51:42 scope redis-server[1425]: 1425:M 20 Mar 2019 09:51:42.174 # Can't save in background: fork: Cannot allocate memory
Mar 20 09:51:48 scope redis-server[1425]: 1425:M 20 Mar 2019 09:51:48.111 * 1 changes in 900 seconds. Saving...
Mar 20 09:51:48 scope redis-server[1425]: 1425:M 20 Mar 2019 09:51:48.142 # Can't save in background: fork: Cannot allocate memory
Mar 20 09:51:54 scope redis-server[1425]: 1425:M 20 Mar 2019 09:51:54.159 * 1 changes in 900 seconds. Saving...
Mar 20 09:51:54 scope redis-server[1425]: 1425:M 20 Mar 2019 09:51:54.163 # Can't save in background: fork: Cannot allocate memory
Mar 20 09:52:00 scope redis-server[1425]: 1425:M 20 Mar 2019 09:52:00.178 * 1 changes in 900 seconds. Saving...
Mar 20 09:52:00 scope redis-server[1425]: 1425:M 20 Mar 2019 09:52:00.196 # Can't save in background: fork: Cannot allocate memory
Mar 20 09:52:06 scope redis-server[1425]: 1425:M 20 Mar 2019 09:52:06.136 * 1 changes in 900 seconds. Saving...
Mar 20 09:52:06 scope redis-server[1425]: 1425:M 20 Mar 2019 09:52:06.150 # Can't save in background: fork: Cannot allocate memory
Mar 20 09:52:12 scope redis-server[1425]: 1425:M 20 Mar 2019 09:52:12.083 * 1 changes in 900 seconds. Saving...
Mar 20 09:52:12 scope redis-server[1425]: 1425:M 20 Mar 2019 09:52:12.083 * Background saving started by pid 5637
Mar 20 09:52:12 scope redis-server[1425]: 5637:C 20 Mar 2019 09:52:12.087 * DB saved on disk
Mar 20 09:52:12 scope redis-server[1425]: 5637:C 20 Mar 2019 09:52:12.087 * RDB: 0 MB of memory used by copy-on-write
Mar 20 09:52:12 scope redis-server[1425]: 1425:M 20 Mar 2019 09:52:12.183 * Background saving terminated with success
Mar 20 18:43:34 scope redis-server[1425]: 1425:M 20 Mar 2019 18:43:34.252 * 1 changes in 900 seconds. Saving...
Mar 20 18:43:34 scope redis-server[1425]: 1425:M 20 Mar 2019 18:43:34.253 * Background saving started by pid 7102
Mar 20 18:43:34 scope redis-server[1425]: 7102:C 20 Mar 2019 18:43:34.257 * DB saved on disk
Mar 20 18:43:34 scope redis-server[1425]: 7102:C 20 Mar 2019 18:43:34.258 * RDB: 0 MB of memory used by copy-on-write
Mar 20 18:43:34 scope redis-server[1425]: 1425:M 20 Mar 2019 18:43:34.353 * Background saving terminated with success

This looks to be again a memory issue. I have restarted the worker for now, but we should revisit this and probably monitor the memory usage by the worker (big METS files are dumped in memory entirely in the import process), alongside Elasticsearch and Redis.

@jraddaoui
Copy link
Collaborator Author

Only Elasticsearch is currently taking 1.3 GB of memory from the 2 GB available in that instance:

root@scope:~# systemctl status elasticsearch
● elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-03-19 20:35:45 UTC; 24h ago
     Docs: http://www.elastic.co
 Main PID: 1443 (java)
    Tasks: 54
   Memory: 1.3G
      CPU: 12min 26.116s
   CGroup: /system.slice/elasticsearch.service
           ├─1443 /usr/bin/java -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -Xss1m -Djava.a
           └─1835 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller

Which doesn't leave a lot of space to parse big METS files at this time.

@jraddaoui
Copy link
Collaborator Author

Hi @stefanabreitwieser and @bunekcca,

I guess you noted the same issue again about a couple of hours ago. I forgot that we're also running Kibana in there which, alongside Elasticsearch, were taking almost all the memory in the server, so I've increased it from 2 to 4 GB. Please, let me know how it goes.

@stefanabreitwieser
Copy link
Collaborator

It's working a lot faster -- thanks Radda!

@sallain sallain added this to Review in SCOPE Mar 26, 2019
@sallain sallain added this to the phase2 milestone Mar 27, 2019
@sallain sallain moved this from Review to Done in SCOPE Mar 28, 2019
@sallain sallain moved this from Done to Review in SCOPE Mar 28, 2019
@sallain sallain closed this as completed Mar 28, 2019
SCOPE automation moved this from Review to Done Mar 28, 2019
@sallain sallain removed the review label Mar 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
SCOPE
  
Done
Development

No branches or pull requests

4 participants