Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: invalid txs_results returned for legacy ABCI responses #3031

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

andynog
Copy link
Contributor

@andynog andynog commented May 7, 2024

close: #3002

This PR fixes the issue reported above.

This is not a storage issue in particular, the results are still in storage after an upgrade, but not returned properly by the RPC endpoint. The fix is to make the /block_results endpoint in v0.38 to return properly a legacy ABCI response created with v0.37.

Once this fix is merged on v0.38 and a patch release is cut, any node on v0.38 (e.g. an archive node) that applies the patch release, should have the results returned properly by the RPC /block_results endpoint.


PR checklist

  • Tests written/updated
  • Changelog entry added in .changelog (we use unclog to manage our changelog)
  • Updated relevant documentation (docs/ or spec/) and code comments
  • Title follows the Conventional Commits spec

@andynog andynog added bug Something isn't working abci Application blockchain interface rpc labels May 7, 2024
@andynog andynog self-assigned this May 7, 2024
@andynog andynog requested review from a team as code owners May 7, 2024 19:30
Copy link
Contributor

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

state/store.go Outdated Show resolved Hide resolved
state/store.go Outdated Show resolved Hide resolved
state/errors.go Outdated Show resolved Hide resolved
state/store.go Outdated Show resolved Hide resolved
state/store.go Outdated Show resolved Hide resolved
@andynog andynog added backport-to-v0.38.x Tell Mergify to backport the PR to v0.38.x backport-to-v1.x Tell Mergify to backport the PR to v1.x labels May 8, 2024
@andynog
Copy link
Contributor Author

andynog commented May 10, 2024

I am working on some additional tests to ensure the logic holds and it makes sense

Copy link
Contributor

@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reserve against this change.

I would move the added test unit elsewhere thought.

api/api_test.go Outdated Show resolved Hide resolved
rpc/core/blocks.go Show resolved Hide resolved
rpc/core/blocks.go Show resolved Hide resolved
state/store.go Outdated Show resolved Hide resolved
state/store.go Outdated Show resolved Hide resolved
state/store.go Outdated
if err := legacyResp.Unmarshal(buf); err != nil {
// only return an error, this method is only invoked through the `/block_results` not for state logic and
// some tests, so no need to exit cometbft if there's an error, just return it.
store.Logger.Debug("failed to unmarshall legacy ABCI response: %s", err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
store.Logger.Debug("failed to unmarshall legacy ABCI response: %s", err.Error())
store.Logger.Debug("failed to unmarshall FinalizeBlockResponse (also tried as legacy ABCI response): %s", err.Error())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also include the error you get in line 662. The reason is, if it was indeed not a legacy response, but it failed to deserialise, the error you get when you later try to deserialise as legacy is not useful.

state/store.go Outdated
// only return an error, this method is only invoked through the `/block_results` not for state logic and
// some tests, so no need to exit cometbft if there's an error, just return it.
store.Logger.Debug("failed to unmarshall legacy ABCI response: %s", err.Error())
return nil, ErrABCIResponseCorruptedOrSpecChangeForHeight{Height: height}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we don't break any API (not sure) it'd be good to include the error inside this error, so the caller can print it if needed

}

responseDeliverTx := abciv1beta2.ResponseDeliverTx{
Code: abci.CodeTypeOK,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you populate the rest of the fields here? (GasWanted, Codespace, etc)

Comment on lines +39 to +41
require.Equal(t, 1, len(legacyABCIResponse.DeliverTxs))
require.Equal(t, 1, len(legacyABCIResponse.BeginBlock.Events))
require.Equal(t, 1, len(legacyABCIResponse.EndBlock.Events))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be more exhaustive checking the values of these

Comment on lines +60 to +61
require.Error(t, err)
require.ErrorContains(t, err, "unexpected EOF")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, by looking at the production changes below, I had understood that the bug is that this unmarshalling actually doesn't error, but leaves some fields unfilled...

Comment on lines +118 to +120
require.NotNil(t, legacyResponseWithNull.DeliverTxs)
require.Nil(t, legacyResponseWithNull.EndBlock)
require.NotNil(t, legacyResponseWithNull.BeginBlock)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we do the same checks here as in TestStateProtoV1Beta2ToV1 above?

@mbreithecker
Copy link

Hey, is there already an timeline when this fix will be released?
We are depending on it and it is very critical.

/block and /block_results are super important queries for cosmos/cometBFT nodes.

@mbreithecker
Copy link

Please also add the mode attribute to the legacy response. Otherwise it is impossible to tell the difference between an BeginBlock and EndBlock event:

state/store.go

// responseFinalizeBlockFromLegacy is a convenience function that takes the old abci responses and morphs
// it to the finalize block response. Note that the app hash is missing
func responseFinalizeBlockFromLegacy(legacyResp *cmtstate.LegacyABCIResponses) *abci.ResponseFinalizeBlock {

	// Add BeginBlock attribute to BeginBlock events
	for idx, event := range legacyResp.BeginBlock.Events {
		legacyResp.BeginBlock.Events[idx].Attributes = append(event.Attributes, abci.EventAttribute{
			Key:   "mode",
			Value: "BeginBlock",
			Index: false,
		})
	}

	// Add EndBlock attribute to BeginBlock events
	for idx, event := range legacyResp.EndBlock.Events {
		legacyResp.EndBlock.Events[idx].Attributes = append(event.Attributes, abci.EventAttribute{
			Key:   "mode",
			Value: "EndBlock",
			Index: false,
		})
	}

	return &abci.ResponseFinalizeBlock{
		TxResults:             legacyResp.DeliverTxs,
		ValidatorUpdates:      legacyResp.EndBlock.ValidatorUpdates,
		ConsensusParamUpdates: legacyResp.EndBlock.ConsensusParamUpdates,
		Events:                append(legacyResp.BeginBlock.Events, legacyResp.EndBlock.Events...),
		// NOTE: AppHash is missing in the response but will
		// be caught and filled in consensus/replay.go
	}
}

@johnletey
Copy link
Contributor

Noble would love to see this released as well!

@andynog
Copy link
Contributor Author

andynog commented Jun 7, 2024

Just an update, still working on this, need to add some additional testing to ensure these changes are good especially backporting to old releases. But hopefully should be ready to be merged sooner than later.

@andynog
Copy link
Contributor Author

andynog commented Jun 12, 2024

Just an update. The fix is still on-going. We believe we have a solution but might be hard to test with live production data (archive nodes). Would anyone that benefits from this fix be able to test against real data before we cut an official release ?

Once the fix is in, we could backport to v0.38.x branch (but we would not cut a release until we have some confirmation the fix works). Then you can test using a binary compiled from that branch and let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
abci Application blockchain interface backport-to-v0.38.x Tell Mergify to backport the PR to v0.38.x backport-to-v1.x Tell Mergify to backport the PR to v1.x bug Something isn't working rpc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

txs_results not returned properly on dYdX nodes
6 participants