Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consensus: change gossip routines sleep and signaling setup #3007

Open
Tracked by #3245
ValarDragon opened this issue May 6, 2024 · 5 comments
Open
Tracked by #3245

consensus: change gossip routines sleep and signaling setup #3007

ValarDragon opened this issue May 6, 2024 · 5 comments
Labels
consensus enhancement New feature or request

Comments

@ValarDragon
Copy link
Collaborator

Feature Request

Summary

Currently every gossip routine attempts to gossip a vote or block part. If it has no more block parts to gossip, it sleeps (by default 100ms). This has three issues:

  • We get an unnecessary delay for first part gossip. When we are ready to gossip new block parts, we will be unnecessarily sleeping for another rand[0,100]ms.
  • If we only have one block part, we gossip it to a peer, then we have nothing else to gossip, we will sleep for 100ms. If another block part comes in, we have to wait that [0,100]ms.
  • Wasted CPU when we are not in the relevant gossip phase

Proposal

Make each gossip thread use a channel that "wakes it up".

For block part gossip:

  • When we receive a new block part, go to the next block, or detect our peer is on a new block, restart our loop.

For vote gossip:

  • When we receive a new vote, go to the next block, or detect our peer is on a new block, restart our loop.

Otherwise keep the current structure for a loop that is "awake".

@ValarDragon ValarDragon added enhancement New feature or request needs-triage This issue/PR has not yet been triaged by the team. labels May 6, 2024
@cason
Copy link
Contributor

cason commented May 13, 2024

In summary, we need a way to signal to the gossip data routines that new block parts are, or might be, available to be sent. This is a limitation of the current design, where consensus logic and reactor "communicate" via shared memory.

A possibility that we have discussed is to leverage the existing synchronous signal cs.evsw.FireEvent(types.EventProposalBlockPart, msg) between the consensus logic and reactor.

@cason cason changed the title Change gossip routines sleep setup consensus: change gossip routines sleep and signaling setup May 13, 2024
@cason
Copy link
Contributor

cason commented May 13, 2024

For votes, there is also a synchronous event produced by the consensus logic: cs.evsw.FireEvent(types.EventVote, vote).

@nmarcetic
Copy link

Maybe is cheaper to KISS, run it in a loop, and just reduce sleep, especially when a peer is behind in gossipDataForCatchup (probably the root cause of slowness; maybe we could handle that in another thread and keep the main loop cleaner 🤷‍♂️ ).
I like the idea of channels, but it could add extra complexity (also more things to change).
For votes, I don't understand the purpose of those sleeps 😢

@cason
Copy link
Contributor

cason commented Jun 11, 2024

For votes, I don't understand the purpose of those sleeps 😢

We only sleep when we don't find a vote to send to the peer. We sleep because we retrieve votes from the consensus logic via shared memory. The option we have been discussing is to explicitly signalize the reactor when there are new stuff (e.g., votes) available in the shared state to be sent.

While we find votes suitable to be sent, we skip the sleeps in the loops.

@cason cason removed the needs-triage This issue/PR has not yet been triaged by the team. label Jun 11, 2024
@cason
Copy link
Contributor

cason commented Jun 11, 2024

This can be in conjunction with #3211.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consensus enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants