Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: crons with concurrency limits set cause the engine to crash #483

Closed
trisongz opened this issue May 10, 2024 · 3 comments · Fixed by #486
Closed

bug: crons with concurrency limits set cause the engine to crash #483

trisongz opened this issue May 10, 2024 · 3 comments · Fixed by #486

Comments

@trisongz
Copy link

I have hatchet self-hosted in a K8s cluster.

Container Images:

  • engine: ghcr.io/hatchet-dev/hatchet/hatchet-engine:v0.26.1
  • api: ghcr.io/hatchet-dev/hatchet/hatchet-api:v0.26.1
  • rabbitmq: docker.io/bitnami/rabbitmq:3.13.2-debian-12-r0

SDK: Python - hatchet-sdk-0.23.0 (0.22.5 prior)

After version 0.23.0, I've consistently run into the following issue when a cron task gets triggered, which then causes a reboot loop on the engine container:

2024-05-10T15:34:53.555Z INF workflow 491b44e5-34ad-4764-847b-fedf8f838362 has concurrency settings service=workflows-controller
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x78 pc=0x14053cf]

goroutine 41 [running]:
github.com/hatchet-dev/hatchet/internal/services/controllers/workflows.(*WorkflowsControllerImpl).scheduleGetGroupAction(0xc00372ef50, {0x1b11638?, 0xc000751320?}, 0x0)
	/hatchet/internal/services/controllers/workflows/queue.go:211 +0xcf
github.com/hatchet-dev/hatchet/internal/services/controllers/workflows.(*WorkflowsControllerImpl).handleWorkflowRunQueued(0xc00372ef50, {0x1b11590?, 0x3d2f720?}, 0xc000750900)
	/hatchet/internal/services/controllers/workflows/queue.go:72 +0x618
github.com/hatchet-dev/hatchet/internal/services/controllers/workflows.(*WorkflowsControllerImpl).handleTask(0xc0004b3ad0?, {0x1b11590, 0x3d2f720}, 0xc000750900)
	/hatchet/internal/services/controllers/workflows/controller.go:199 +0x117
github.com/hatchet-dev/hatchet/internal/services/controllers/workflows.(*WorkflowsControllerImpl).Start.func1(0xc0007ba000?)
	/hatchet/internal/services/controllers/workflows/controller.go:161 +0x90
github.com/hatchet-dev/hatchet/internal/msgqueue/rabbitmq.(*MessageQueueImpl).subscribe.func1.2({{0x1b0f940, 0xc00062c7e0}, 0x0, {0x0, 0x0}, {0x0, 0x0}, 0x0, 0x0, {0x0, ...}, ...})
	/hatchet/internal/msgqueue/rabbitmq/rabbitmq.go:502 +0x88b
created by github.com/hatchet-dev/hatchet/internal/msgqueue/rabbitmq.(*MessageQueueImpl).subscribe.func1 in goroutine 154
	/hatchet/internal/msgqueue/rabbitmq/rabbitmq.go:451 +0x5c6

I've attempted the following to debug:

  • Delete all workflows, which deletes the cron schedules and allows the engine container to get back up.
  • Recreate rabbitmq, including the persistent data, which doesn't do anything.

I am able to trigger the workflow manually, but whenever the cron schedule triggers the workflow, that issue occurs.

@abelanger5
Copy link
Contributor

Hey @trisongz, thanks for the report - I'll be taking a look at this today. This looks like an issue with the workflow run not being created properly from cron workflows if you have a concurrency limit setting on the workflow run. This isn't an issue with RabbitMQ, so no need to restart things on that side (the methods are just being triggered by a RabbitMQ message).

@trisongz
Copy link
Author

Thanks for the response, I was able to confirm that after removing concurrency from it, that the latest version works.

@abelanger5 abelanger5 changed the title bug: rabbitmq subscribe issue with hatchet-engine > 0.23.0 bug: crons with concurrency limits set cause the engine to crash May 11, 2024
@abelanger5
Copy link
Contributor

This is fixed in v0.26.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants