You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a race where engine connections to schedulers takes a finite amount of time, and it is possible for clients to be informed of an engine's registration before the engine is fully connected to all schedulers.
The current registration design effectively assumes heartbeats will complete after registration is complete, but the faster launch-then-use nature of the cluster Python API reveals the race.
Using the zmq.ROUTER_PROBE should allow us to delay registration notification until all probe messages have been received by the schedulers.
The text was updated successfully, but these errors were encountered:
This should be not only more rigorous, but faster because we can wait for the actual event: everything is connected, rather than needing to set a long heartbeat and/or registration timeout
There is a race where engine connections to schedulers takes a finite amount of time, and it is possible for clients to be informed of an engine's registration before the engine is fully connected to all schedulers.
The current registration design effectively assumes heartbeats will complete after registration is complete, but the faster launch-then-use nature of the cluster Python API reveals the race.
Using the zmq.ROUTER_PROBE should allow us to delay registration notification until all probe messages have been received by the schedulers.
The text was updated successfully, but these errors were encountered: