When a worker pod is killed, no mechanism for retrying task #765

NikBisht · 2022-05-11T20:56:59Z

We're using Redis broker + DyanmoDB backend, and we've noticed that when a worker pod is terminated (ungracefully) and the task was still running, the task stays in STARTED state. It seems as though Machinery doesn't have a timeout at which point it we re-queue tasks that have been in STARTED state for a long period of time. This seems like a critical feature for fault tolerance.

The text was updated successfully, but these errors were encountered:

taylorzhangyx · 2022-08-29T11:59:09Z

I face the same issue here.

zhouhui521 · 2022-09-23T09:12:32Z

I face the same issue here.

kushalhalder · 2022-11-18T11:20:28Z

Do we have any updates or workarounds against this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When a worker pod is killed, no mechanism for retrying task #765

When a worker pod is killed, no mechanism for retrying task #765

NikBisht commented May 11, 2022

taylorzhangyx commented Aug 29, 2022

zhouhui521 commented Sep 23, 2022

kushalhalder commented Nov 18, 2022

When a worker pod is killed, no mechanism for retrying task #765

When a worker pod is killed, no mechanism for retrying task #765

Comments

NikBisht commented May 11, 2022

taylorzhangyx commented Aug 29, 2022

zhouhui521 commented Sep 23, 2022

kushalhalder commented Nov 18, 2022