Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mutil: introduce simple BlockBeat to fix itest flakes #8717

Draft
wants to merge 16 commits into
base: elle-0-18-1
Choose a base branch
from

Conversation

yyforyongyu
Copy link
Collaborator

In an attempt to fix the itest flakes found in the sweeper/multi-hop tests, I realized there's no easy way to get around the block sync issue. In fact, I think these flakes already mean block sync may cause issues outside of the test context. Thus, a minimal version of BlockBeat is implemented.

Atm, when a new block is received, it's sent to the subsystems concurrently,

flowchart
    block((block))
    
    block ---> c[ChainArb]
    block ---> ca1[ChannelArb1]
    block ---> ca2[ChannelArb2]
		block ---> us[UtxoSweeper]
		block ---> b[Bumper]
    block ---> ar[AnchorResolver]
    block ---> cr[CommitResolver]
    block ---> hr[HtlcResolver...]

One system may process the block faster than another, causing a block sync issue. Blockbeat will make sure the blocks are sent in the following order,

flowchart
    block((block))
    
    block --> c[ChainArb]
    c --> ca1[ChannelArb1]
		c --> ca2[ChannelArb2]
		c --> ca3[ChannelArb...]
    ca1 ---> ar[AnchorResolver]
    ca1 ---> cr[CommitResolver]
    ca1 ---> hr[HtlcResolver...]
    ca2 ---> rs2[Resolvers...]
    ca3 ---> rs3[Resolvers...]
		ar ---> us[UtxoSweeper]
		cr ---> us[UtxoSweeper]
	  hr ---> us[UtxoSweeper]
	  rs2 ---> us[UtxoSweeper]
	  rs3 ---> us[UtxoSweeper]
		us ---> b[Bumper]

which means the following new behavior in the sweeper:

  1. no more waiting for the next block to trigger the sweep - when an input is sent to the sweeper at height X, the sweeping won't happen until block X+1, which is no longer the case.
  2. no need to immediately sweep inputs during restart - this hack can be removed now.
  3. no more different views of block heights among these subsystems - the fee func can now accurately calculate its width.

TODOs:

  • fix unit tests
  • add unit tests
  • fix itests -> new pr to limit the scope

…ckEpoch`

So the block data can be used by subsystems without calling `GetBlock`.
…tems

In this commit, a minimal version of `BlockBeat` is added to synchronize
block heights, which will be used in `ChainArb`, `Sweeper`, and
`TxPublisher` so blocks are processed sequentially among them.
Also updated the loggings. This new state will be used in the following
commit.
This commit adds a new method `handleInitialBroadcast` to handle the
initial broadcast. Previously we'd broadcast immediately inside
`Broadcast`, which soon will not work giving the blockbeat being used in
the following commit.
Previously in `markInputFailed`, we'd remove all inputs under the same
group via `removeExclusiveGroup`. This is wrong as when the current
sweep fails for this input, it shouldn't affect other inputs.
…b` to use blockbeat

In this commit, we replace the individual block subscription with the
implementation of the interface method `ProcessBlock` so they share a
single block notifier.
This commit refactors the block dispatching logic in `ChainArbitrator`
so the blocks are sent concurrently to all active channel arbitrators.
It also makes sure the blockbeat is now sent to channel arbitrators.
This commit changes the `activeResolvers` map so each active resolver
now has a block chan to receive new blocks.
After this commit, when a new block comes, it will be passed through
chainArb -> ChannelArbitrator -> resolvers.
This `immediate` flag was added as a hack so during a restart, the
pending resolvers would offer the inputs to the sweeper and ask it to
sweep them immediately. This is no longer need due to `blockbeat`, as
now during restart, a block is always sent to all subsystems via the
flow `ChainArb` -> `ChannelArb` -> resolvers -> sweeper. Thus, when
there are pending inputs offered, they will be processed by the sweeper
immediately.
Also updated `handlePendingSweepsReq` to skip immature inputs so the
returned results are the same as those in pre-0.18.0.
The sweeper can handle the waiting so there's no need to wait for blocks
inside the resolvers. By offering the inputs prior to their mature
heights also guarantees the inputs with the same deadline are
aggregated.
@yyforyongyu yyforyongyu added testing Improvements/modifications to the test suite utxo sweeping itests Issues related to integration tests. labels May 2, 2024
@yyforyongyu yyforyongyu added this to the v0.18.1 milestone May 2, 2024
Copy link

coderabbitai bot commented May 2, 2024

Important

Auto Review Skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@morehouse
Copy link
Collaborator

So the basic idea is to synchronize the entire contractcourt and sweeper on each block?

Do arrows in the above diagrams represent passing around the latest block info? It seems odd to have all the resolvers separately notifying the sweeper of the latest block.

Is it guaranteed that a BlockBeat fires on startup with the latest block info? I think this is required to eliminate the immediate sweep on startup logic.

@yyforyongyu
Copy link
Collaborator Author

yyforyongyu commented May 3, 2024

So the basic idea is to synchronize the entire contractcourt and sweeper on each block?

Yes.

Do arrows in the above diagrams represent passing around the latest block info? It seems odd to have all the resolvers separately notifying the sweeper of the latest block.

It represents the timeline of the block, maybe I should try a different graph, but the block is always sent by the BlockBeat.

Is it guaranteed that a BlockBeat fires on startup with the latest block info? I think this is required to eliminate the immediate sweep on startup logic.

Yes, when a new subscription is made via RegisterBlockEpochNtfn, the current block should always be sent.

@morehouse
Copy link
Collaborator

Concept ACK.

It represents the timeline of the block, maybe I should try a different graph, but the block is always sent by the BlockBeat.

Yeah, the diagram is a bit confusing. From the code it looks like ChainArbitrator passes block info down to channel arbitrators, which pass it to resolvers. But the sweeper gets the block info directly from BlockBeat, right?

So basically BlockBeat (1) sends block info to contractcourt, then (2) sends block info to sweeper. There's a single entry point for each subsystem, each of which is responsible for distributing the block info to the rest of the subsystem. Something like this?

flowchart LR

  block([new block]) --> bb[BlockBeat]
  bb --1--> c[ChainArbitrator]
  bb --2--> us[UtxoSweeper]
  c ~~~ us
  subgraph cc[contractcourt]
    c --> ca1[ChannelArb1]
		c --> ca2[ChannelArb2]
		c --> ca3[ChannelArb...]
    ca1 ---> ar[AnchorResolver]
    ca1 ---> cr[CommitResolver]
    ca1 ---> hr[HtlcResolver...]
    ca2 ---> rs2[Resolvers...]
    ca3 ---> rs3[Resolvers...]
  end
  subgraph s[sweep]
		us ---> b[Bumper]
  end

@yyforyongyu
Copy link
Collaborator Author

But the sweeper gets the block info directly from BlockBeat, right?

Yeah exactly.

So basically BlockBeat (1) sends block info to contractcourt, then (2) sends block info to sweeper. There's a single entry point for each subsystem, each of which is responsible for distributing the block info to the rest of the subsystem. Something like this?

Close enough! I'd like to view Bumper an independent system as it can be moved out of the sweep package. Another important new behavior is the subsystem must ACK a done signal back after processing the block - this would defintely require more attention when designing/refactoring subsystems, as it forces a designer to define what does it mean when we say the subsystem has finished processing a block. Maybe it's more clear using a sequence graph,

sequenceDiagram
		autonumber
		participant bb as BlockBeat
		participant cc as ChainArb
		participant us as UtxoSweeper
		participant tp as TxPublisher
		
		note left of bb: 0. received block x,<br>dispatching...
		
    note over bb,cc: 1. send block x to ChainArb,<br>wait for its done signal
		bb->>cc: block x
		rect rgba(165, 0, 85, 0.8)
      critical signal processed
        cc->>bb: processed block
      option Process error or timeout
        bb->>bb: error and exit
      end
    end

    note over bb,us: 2. send block x to UtxoSweeper, wait for its done signal
		bb->>us: block x
		rect rgba(165, 0, 85, 0.8)
      critical signal processed
        us->>bb: processed block
      option Process error or timeout
        bb->>bb: error and exit
      end
    end

    note over bb,tp: 3. send block x to TxPublisher, wait for its done signal
		bb->>tp: block x
		rect rgba(165, 0, 85, 0.8)
      critical signal processed
        tp->>bb: processed block
      option Process error or timeout
        bb->>bb: error and exit
      end
    end

@morehouse
Copy link
Collaborator

Cool. I think this could be used to implement #8166. We would just need to make sure that the initial block is distributed by BlockBeat before the peer manager starts.

@morehouse
Copy link
Collaborator

Does TxPublisher need to process the block before UtxoSweeper?

Suppose a conflicting transaction confirms at block X (e.g., #8737). If UtxoSweeper processes block X first, it won't know about any inputs that need to be rebatched due to the conflicting transaction. So publishing a rebatched transaction won't happen until block X+1 confirms.

But if TxPublisher processes block X first, it will fail the conflicted inputs, thereby allowing UtxoSweeper to publish a rebatched transaction immediately.

@yyforyongyu
Copy link
Collaborator Author

Good question @morehouse. I need to think about it more, whether UtxoSweeper -> TxPublisher, or the other around, or in parallel. The issue with TxPublisher -> UtxoSweeper means every tx will be broadcast at X+1. Meanwhile, once this TODO is in place, I think we can eliminate creating conflicting txns completely?

reconstruct the input set from the mempool tx, which is the missing part that gives the sweeper the ability to recover its state from a restart.

@morehouse
Copy link
Collaborator

The issue with TxPublisher -> UtxoSweeper means every tx will be broadcast at X+1.

I don't think so? TxPublisher.Broadcast will still do the initial broadcast immediately.

Meanwhile, once this TODO is in place, I think we can eliminate creating conflicting txns completely?

Conflicting transactions aren't only created by us. The counterparty can also spend HTLCs, causing conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
itests Issues related to integration tests. testing Improvements/modifications to the test suite utxo sweeping
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants