mutil: introduce simple `BlockBeat` to fix itest flakes #8717

yyforyongyu · 2024-05-02T06:46:50Z

In an attempt to fix the itest flakes found in the sweeper/multi-hop tests, I realized there's no easy way to get around the block sync issue. In fact, I think these flakes already mean block sync may cause issues outside of the test context. Thus, a minimal version of BlockBeat is implemented.

Atm, when a new block is received, it's sent to the subsystems concurrently,

flowchart
    block((block))
    
    block ---> c[ChainArb]
    block ---> ca1[ChannelArb1]
    block ---> ca2[ChannelArb2]
		block ---> us[UtxoSweeper]
		block ---> b[Bumper]
    block ---> ar[AnchorResolver]
    block ---> cr[CommitResolver]
    block ---> hr[HtlcResolver...]

One system may process the block faster than another, causing a block sync issue. Blockbeat will make sure the blocks are sent in the following order,

flowchart
    block((block))
    
    block --> c[ChainArb]
    c --> ca1[ChannelArb1]
		c --> ca2[ChannelArb2]
		c --> ca3[ChannelArb...]
    ca1 ---> ar[AnchorResolver]
    ca1 ---> cr[CommitResolver]
    ca1 ---> hr[HtlcResolver...]
    ca2 ---> rs2[Resolvers...]
    ca3 ---> rs3[Resolvers...]
		ar ---> us[UtxoSweeper]
		cr ---> us[UtxoSweeper]
	  hr ---> us[UtxoSweeper]
	  rs2 ---> us[UtxoSweeper]
	  rs3 ---> us[UtxoSweeper]
		us ---> b[Bumper]

which means the following new behavior in the sweeper:

no more waiting for the next block to trigger the sweep - when an input is sent to the sweeper at height X, the sweeping won't happen until block X+1, which is no longer the case.
no need to immediately sweep inputs during restart - this hack can be removed now.
no more different views of block heights among these subsystems - the fee func can now accurately calculate its width.

TODOs:

fix unit tests
add unit tests
fix itests -> new pr to limit the scope

…ckEpoch` So the block data can be used by subsystems without calling `GetBlock`.

…tems In this commit, a minimal version of `BlockBeat` is added to synchronize block heights, which will be used in `ChainArb`, `Sweeper`, and `TxPublisher` so blocks are processed sequentially among them.

Also updated the loggings. This new state will be used in the following commit.

This commit adds a new method `handleInitialBroadcast` to handle the initial broadcast. Previously we'd broadcast immediately inside `Broadcast`, which soon will not work giving the blockbeat being used in the following commit.

Previously in `markInputFailed`, we'd remove all inputs under the same group via `removeExclusiveGroup`. This is wrong as when the current sweep fails for this input, it shouldn't affect other inputs.

…b` to use blockbeat In this commit, we replace the individual block subscription with the implementation of the interface method `ProcessBlock` so they share a single block notifier.

This commit refactors the block dispatching logic in `ChainArbitrator` so the blocks are sent concurrently to all active channel arbitrators. It also makes sure the blockbeat is now sent to channel arbitrators.

This commit changes the `activeResolvers` map so each active resolver now has a block chan to receive new blocks.

After this commit, when a new block comes, it will be passed through chainArb -> ChannelArbitrator -> resolvers.

This `immediate` flag was added as a hack so during a restart, the pending resolvers would offer the inputs to the sweeper and ask it to sweep them immediately. This is no longer need due to `blockbeat`, as now during restart, a block is always sent to all subsystems via the flow `ChainArb` -> `ChannelArb` -> resolvers -> sweeper. Thus, when there are pending inputs offered, they will be processed by the sweeper immediately.

Also updated `handlePendingSweepsReq` to skip immature inputs so the returned results are the same as those in pre-0.18.0.

The sweeper can handle the waiting so there's no need to wait for blocks inside the resolvers. By offering the inputs prior to their mature heights also guarantees the inputs with the same deadline are aggregated.

coderabbitai · 2024-05-02T06:46:56Z

Important

Auto Review Skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

morehouse · 2024-05-02T22:17:27Z

So the basic idea is to synchronize the entire contractcourt and sweeper on each block?

Do arrows in the above diagrams represent passing around the latest block info? It seems odd to have all the resolvers separately notifying the sweeper of the latest block.

Is it guaranteed that a BlockBeat fires on startup with the latest block info? I think this is required to eliminate the immediate sweep on startup logic.

yyforyongyu · 2024-05-03T05:48:27Z

So the basic idea is to synchronize the entire contractcourt and sweeper on each block?

Yes.

Do arrows in the above diagrams represent passing around the latest block info? It seems odd to have all the resolvers separately notifying the sweeper of the latest block.

It represents the timeline of the block, maybe I should try a different graph, but the block is always sent by the BlockBeat.

Is it guaranteed that a BlockBeat fires on startup with the latest block info? I think this is required to eliminate the immediate sweep on startup logic.

Yes, when a new subscription is made via RegisterBlockEpochNtfn, the current block should always be sent.

morehouse · 2024-05-03T16:18:30Z

Concept ACK.

It represents the timeline of the block, maybe I should try a different graph, but the block is always sent by the BlockBeat.

Yeah, the diagram is a bit confusing. From the code it looks like ChainArbitrator passes block info down to channel arbitrators, which pass it to resolvers. But the sweeper gets the block info directly from BlockBeat, right?

So basically BlockBeat (1) sends block info to contractcourt, then (2) sends block info to sweeper. There's a single entry point for each subsystem, each of which is responsible for distributing the block info to the rest of the subsystem. Something like this?

flowchart LR

  block([new block]) --> bb[BlockBeat]
  bb --1--> c[ChainArbitrator]
  bb --2--> us[UtxoSweeper]
  c ~~~ us
  subgraph cc[contractcourt]
    c --> ca1[ChannelArb1]
		c --> ca2[ChannelArb2]
		c --> ca3[ChannelArb...]
    ca1 ---> ar[AnchorResolver]
    ca1 ---> cr[CommitResolver]
    ca1 ---> hr[HtlcResolver...]
    ca2 ---> rs2[Resolvers...]
    ca3 ---> rs3[Resolvers...]
  end
  subgraph s[sweep]
		us ---> b[Bumper]
  end

yyforyongyu · 2024-05-04T07:47:06Z

But the sweeper gets the block info directly from BlockBeat, right?

Yeah exactly.

So basically BlockBeat (1) sends block info to contractcourt, then (2) sends block info to sweeper. There's a single entry point for each subsystem, each of which is responsible for distributing the block info to the rest of the subsystem. Something like this?

Close enough! I'd like to view Bumper an independent system as it can be moved out of the sweep package. Another important new behavior is the subsystem must ACK a done signal back after processing the block - this would defintely require more attention when designing/refactoring subsystems, as it forces a designer to define what does it mean when we say the subsystem has finished processing a block. Maybe it's more clear using a sequence graph,

sequenceDiagram
		autonumber
		participant bb as BlockBeat
		participant cc as ChainArb
		participant us as UtxoSweeper
		participant tp as TxPublisher
		
		note left of bb: 0. received block x,<br>dispatching...
		
    note over bb,cc: 1. send block x to ChainArb,<br>wait for its done signal
		bb->>cc: block x
		rect rgba(165, 0, 85, 0.8)
      critical signal processed
        cc->>bb: processed block
      option Process error or timeout
        bb->>bb: error and exit
      end
    end

    note over bb,us: 2. send block x to UtxoSweeper, wait for its done signal
		bb->>us: block x
		rect rgba(165, 0, 85, 0.8)
      critical signal processed
        us->>bb: processed block
      option Process error or timeout
        bb->>bb: error and exit
      end
    end

    note over bb,tp: 3. send block x to TxPublisher, wait for its done signal
		bb->>tp: block x
		rect rgba(165, 0, 85, 0.8)
      critical signal processed
        tp->>bb: processed block
      option Process error or timeout
        bb->>bb: error and exit
      end
    end

morehouse · 2024-05-06T16:03:48Z

Cool. I think this could be used to implement #8166. We would just need to make sure that the initial block is distributed by BlockBeat before the peer manager starts.

morehouse · 2024-05-07T18:47:39Z

Does TxPublisher need to process the block before UtxoSweeper?

Suppose a conflicting transaction confirms at block X (e.g., #8737). If UtxoSweeper processes block X first, it won't know about any inputs that need to be rebatched due to the conflicting transaction. So publishing a rebatched transaction won't happen until block X+1 confirms.

But if TxPublisher processes block X first, it will fail the conflicted inputs, thereby allowing UtxoSweeper to publish a rebatched transaction immediately.

yyforyongyu · 2024-05-08T07:29:46Z

Good question @morehouse. I need to think about it more, whether UtxoSweeper -> TxPublisher, or the other around, or in parallel. The issue with TxPublisher -> UtxoSweeper means every tx will be broadcast at X+1. Meanwhile, once this TODO is in place, I think we can eliminate creating conflicting txns completely?

reconstruct the input set from the mempool tx, which is the missing part that gives the sweeper the ability to recover its state from a restart.

morehouse · 2024-05-08T15:08:42Z

The issue with TxPublisher -> UtxoSweeper means every tx will be broadcast at X+1.

I don't think so? TxPublisher.Broadcast will still do the initial broadcast immediately.

Meanwhile, once this TODO is in place, I think we can eliminate creating conflicting txns completely?

Conflicting transactions aren't only created by us. The counterparty can also spend HTLCs, causing conflicts.

yyforyongyu added 16 commits May 2, 2024 14:33

chainntnfs+lnd: replace BlockHeader with Block in `chainntnfs.Blo…

8200589

…ckEpoch` So the block data can be used by subsystems without calling `GetBlock`.

chainio: add blockbeat to handle block synchronization among subsys…

e8abbd6

…tems In this commit, a minimal version of `BlockBeat` is added to synchronize block heights, which will be used in `ChainArb`, `Sweeper`, and `TxPublisher` so blocks are processed sequentially among them.

sweep: add new state TxError for erroneous sweepings

0963f98

Also updated the loggings. This new state will be used in the following commit.

sweep: remove redundant error from Broadcast

70d7264

sweep: add method handleBumpEventError and fix markInputFailed

bbe20d6

Previously in `markInputFailed`, we'd remove all inputs under the same group via `removeExclusiveGroup`. This is wrong as when the current sweep fails for this input, it shouldn't affect other inputs.

sweep+contractcourt: update UtxoSweeper, TxPublisher and `ChainAr…

19f807d

…b` to use blockbeat In this commit, we replace the individual block subscription with the implementation of the interface method `ProcessBlock` so they share a single block notifier.

contractcourt: use blockbeat in ChannelArbitrator

d94bc66

This commit refactors the block dispatching logic in `ChainArbitrator` so the blocks are sent concurrently to all active channel arbitrators. It also makes sure the blockbeat is now sent to channel arbitrators.

contractcourt: init activeResolvers with a block chan

9f5053d

This commit changes the `activeResolvers` map so each active resolver now has a block chan to receive new blocks.

contractcourt: remove block subscription used in resolvers

c8e8578

After this commit, when a new block comes, it will be passed through chainArb -> ChannelArbitrator -> resolvers.

lnd: start blockbeat service and register subsystems

6290852

sweep: add method isMature on SweeperInput

871116a

Also updated `handlePendingSweepsReq` to skip immature inputs so the returned results are the same as those in pre-0.18.0.

contractcourt: remove waitForHeight in resolvers

fb7c5c1

The sweeper can handle the waiting so there's no need to wait for blocks inside the resolvers. By offering the inputs prior to their mature heights also guarantees the inputs with the same deadline are aggregated.

sweep: make sure defaultDeadline is derived from the mature height

bcfcf88

itest: fix sweep tests and remove hacks

fe8a3a4

yyforyongyu added testing Improvements/modifications to the test suite utxo sweeping itests Issues related to integration tests. labels May 2, 2024

yyforyongyu added this to the v0.18.1 milestone May 2, 2024

saubyk mentioned this pull request May 7, 2024

sweep: finalize and enhance the new sweeper's functionality #8680

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mutil: introduce simple `BlockBeat` to fix itest flakes #8717

mutil: introduce simple `BlockBeat` to fix itest flakes #8717

yyforyongyu commented May 2, 2024

coderabbitai bot commented May 2, 2024

Auto Review Skipped

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

Documentation and Community

morehouse commented May 2, 2024

yyforyongyu commented May 3, 2024 •

edited

morehouse commented May 3, 2024

yyforyongyu commented May 4, 2024

morehouse commented May 6, 2024

morehouse commented May 7, 2024

yyforyongyu commented May 8, 2024

morehouse commented May 8, 2024

mutil: introduce simple BlockBeat to fix itest flakes #8717

Are you sure you want to change the base?

mutil: introduce simple BlockBeat to fix itest flakes #8717

Conversation

yyforyongyu commented May 2, 2024

coderabbitai bot commented May 2, 2024

Auto Review Skipped

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

morehouse commented May 2, 2024

yyforyongyu commented May 3, 2024 • edited

morehouse commented May 3, 2024

yyforyongyu commented May 4, 2024

morehouse commented May 6, 2024

morehouse commented May 7, 2024

yyforyongyu commented May 8, 2024

morehouse commented May 8, 2024

mutil: introduce simple `BlockBeat` to fix itest flakes #8717

mutil: introduce simple `BlockBeat` to fix itest flakes #8717

CodeRabbit Configration File (`.coderabbit.yaml`)

yyforyongyu commented May 3, 2024 •

edited