-
-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to retry a timed out job? #401
Comments
There's many settings to the worker class, and you can specify max retry on specific jobs. Please see the documentation. I also suggest reading this part: https://arq-docs.helpmanual.io/#retrying-jobs-and-cancellation |
Right, I've been using Retry and job aborting successfully, but am struggling with timeouts-vs-retries. Is it a correct expectation that a timed-out job (job runs longer than I tried catching the error, but it doesn't propagate up to where I invoke try:
arq.run_worker(WorkerSettings)
except Exception as e:
# Never hit on TimeoutError
logging.exception('worker error') By the way, this is what the worker logs when you run my reproduction steps:
|
Context
I'm trying to add some fail-safes around a resource-intensive job with a lot of external dependencies, so it can sometimes hang or OOM. It usually works on the next retry.
Issue
I'd like to set a job-specific timeout and have it retry after a TimeoutError, but I can't figure out how to do that. The TimeoutError seems to be terminal and I can't get it to retry... any advice on how to make this work?
See related issue: #402
Reproduction
python script.py worker
python script.py client
The text was updated successfully, but these errors were encountered: