Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch on_complete is never called because of Poison pill #6234

Open
route opened this issue Mar 19, 2024 · 3 comments
Open

Batch on_complete is never called because of Poison pill #6234

route opened this issue Mar 19, 2024 · 3 comments

Comments

@route
Copy link

route commented Mar 19, 2024

Ruby version: 2.7.8p225
Rails version: 6.1.5.1
Sidekiq / Pro / Enterprise version(s): 7.1.2

  1. None poison pill scenario.

When creating a batch with 1 job and 3 retries, sidekiq will call on_complete callback anyways even job is failing on the first attempt. After 3 attempts failing job will be moved to a dead list. Counters for the batch will be: Job Count: 1 | Pending: 1 | Failed: 1

  1. Poison pill scenario.

Suppose we have a job that crashes sidekiq. Same scenario with batch doesn't work now in this case. on_complete callback is never invoked as Failed counter is not incremented to 1 even though sidekiq + superfetch tried to recover this orphan 3 times and gave up moving it to dead list. Sidekiq before moving job to dead list should increment counters correctly and consider this job as failed otherwise batch is not closed correctly.

@mperham
Copy link
Collaborator

mperham commented Mar 19, 2024

That's all the expected behavior. A Poison Pill kills the entire process, I can't mark it as failed because the code is no longer running.

When the poison pill finally dies, the batch code should fire the on_death callback.

@route
Copy link
Author

route commented Mar 19, 2024

We know that after n attempts the job is considered a poison pill and thus we move it to a dead list, isn’t it a good place to keep the counters up to date? In both scenarios the job is failing, but only one calls callback.

I need to check how on_death works.

@mperham
Copy link
Collaborator

mperham commented Mar 19, 2024

Ok, I can see how marking the poison pill as failed in the batch would be useful. I'll look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants