Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: provide first-class ssz support on api #6749

Open
wants to merge 269 commits into
base: unstable
Choose a base branch
from
Open

Conversation

nflaig
Copy link
Member

@nflaig nflaig commented May 8, 2024

Motivation

Initial discussion around this has started in #5128 to provide first-class support for SSZ in both request and response bodies but besides that, there are plenty of other shortcomings in the way we currently define routes and handle server / client responses. In a lot of cases, Lodestar was not following the spec (e.g. not setting all required headers) causing incompatibilities with other clients. The main reason for this is because our API spec tests were incomplete, headers were not checked at all or if we support SSZ on all required routes as per spec, and addressing those shortcomings requires workarounds like overriding handlers on both the client and server.

Description

This PR changes the structure of how routes are defined and makes it trivial to add SSZ support to any route. The only exception are routes with data payloads that cannot be represented as a SSZ type due to lack of JSON mapping, meaning it is not possible to translate the JSON payload into a SSZ-serialized payload.

The second major change is how we handle requests and responses on the server and client. While previously if you would request a state as SSZ from the server, the client would only be able to handle that format and if the server didn't support it, the request would fail. Now, we use content type negotiation between client and server which works based on headers and status codes to allow a more flexible and robust handling of requests and responses.

The third redesign in this PR is related to the API client, it is now possible to provide options like HTTP timeout per request and the response object is much more complete and self-contained, e.g. it's no longer needed to call ApiError.assert(...) everywhere (more examples below). This refactoring allows for further extensions in the future like a per route strategy for sending requests to fallback nodes and picking responses based on metadata, e.g. pick the most profitable block from all connected beacon nodes.

BREAKING CHANGES

While there is are no breaking changes to the validator client and beacon node, the client exported from @lodestar/api package changed significantly, and there are minor other changes to exported types but those are not as relevant for most consumers.

These are the most notable changes

Client initialization

before

const api = getClient({urls, getAbortSignal: () => opts.abortController.signal}, {config, logger});

after

const api = getClient({urls, globalInit: {signal: opts.abortController.signal}}, {config, logger});

Request with value

before

const res = await api.beacon.getGenesis();
ApiError.assert(res, "Can not fetch genesis data from beacon node");
const genesis = res.response.data;

after (.value() will assert if request was successful and throw detailed error)

const genesis = (await api.beacon.getGenesis()).value()

Request without value

before

ApiError.assert(await beaconClient.submitPoolVoluntaryExit(signedVoluntaryExit));

after (response object is self-contained, no utility like ApiError.assert is required anymore)

(await beaconClient.submitPoolVoluntaryExit({signedVoluntaryExit})).assertOk();

Passing arguments

before

api.validator.produceAttestationData(committeeIndex, slot)

after (args are passed as object, it is now possible to pass per request options, like timeout)

api.validator.produceAttestationData({committeeIndex, slot}, {timeoutMs: 4000})

TODO

  • review proofs routes, do we wanna follow spec proposal?
  • review lightclient routes, spec suggests to get fork version from attested_header slot, previous work from #4641? test against https://eth-light.xyz/, Access-Control-Expose-Headers: Eth-Consensus-Version header required?
  • review network identity route, use enr package / add ssz support?
  • move @param from method jsdoc to property
  • more documentation, update examples, update reasoning about API definitions
  • more http client tests, mostly in regard to retries
  • retry mechanism in http client if we receive a 415 we need to cache (per url x operationId) that server does not support it, resend different content type and on subsequent requests only sent supported content type
  • default per route config to use spec compliant content types, e.g. if spec does not define SSZ request body, do not use it by default. Should be configurable to always use SSZ in a Lodestar only setups
  • add option to server to disable JSON responses to avoid potential DoS vector, how to handle routes that do not support SSZ?
  • ensure Lodestar is compatible with latest checkpointz release (#164, #165)
  • final compatibility check with other clients and older Lodestar version using kurtosis

Closes #5128
Closes #5244
Closes #5826
Closes #6110
Closes #6564
Closes #6634

@nflaig nflaig requested a review from a team as a code owner May 17, 2024 13:32
@philknows philknows modified the milestone: v1.19.0 May 21, 2024
Copy link
Contributor

@nazarhussain nazarhussain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot to review further but sharing my initial review.

My major concern upto this point is the the lack of response codes from the routes definitions. Now use will never know any endpoint can response with 200, 206, 503, or 404.

We are putting alls calls in a promise box and saying it would be either fullfil or fail. But in case of failure user don't know what could be expected status codes.

This was a major information missing in earlier implementation and we added that in last refactor. I don't think loosing that context is a worthwhile.

}

export function createApiClientMethods<Es extends Record<string, Endpoint>>(
definitions: RouteDefinitions<Es>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use either routesDefinitions or routes instead of just definitions for clarity. The word definitions does is not understandable in context this function is used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the term "definitions" throughout the api package for route definitions, the function only has 1 line of code and we have the type name in the same line, should be clear enough.

Would prefer to keep as is for consistency

): ApiClientMethods<Es> {
return mapValues(definitions, (definition, operationId) => {
return createApiClientMethod(definition, client, operationId as string);
}) as unknown as ApiClientMethods<Es>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anywhere you have to use unknown as that highlights some hidden type complexity which may lead to problem in future. As it's all new code written so better to identify and address such points right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should try to avoid using unknown as, I previously looked at this already and gave it another attempt today but I don't see what's the issue, the types look good to me, seems to be some issue how mapValues remaps the types, the type error is pretty strange Type 'Es[keyof Es]' is not comparable to type 'Es[K]'

We have the same type cast on unstable branch

}) as unknown as ApiWithExtraOpts<Api>;

My guess is there is some issue with the typing of mapValues that makes the translation incorrect but not sure how to fix it right now, maybe you could take a look and help with that. If it's related to mapValues type might be better to fix on unstable branch first

return client.getBlockV2(blockId, format);
},
};
export function getClient(config: ChainForkConfig, httpClient: IHttpClient): ApiClientMethods<Endpoints> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method name and the return type is not matching. We asked for a client and return type says it's returning client methods. Would be nicer to align the type names with the functions where these are used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated return type name to better align with method name

} as ApiClientResponse<{[HttpStatusCode.OK]: Uint8Array}>;
}
return client.getState(stateId, format);
getState(args, init) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the second argument are request options or initialization options?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second arg is RequestInit & LodestarExtras

eventSource.close();
onClose?.();
signal.removeEventListener("abort", close);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's one time listener, so need to remove. This way you can use this listener as inline similars to all others used in this file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggestion to just remove the line

signal.removeEventListener("abort", close);

or anything else in addition to that?


export type Api = {
type AttestationList = ValueOf<typeof AttestationListType>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type is same as phase0.Attestation[] then why to declare with more complex patterns?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea is to always match the ssz type by deriving the ts type from it. In this specific case, I think it's even better if we stick to phase0.Attestation[] as for electra we will have to change it to allForks.Attestation[] and construct the ssz type dynamically based on the fork.

I am honestly in favor of changing it as well as you suggested, and remove all -List suffixed type unless those are more broadly used (outside routes file).

Those were added by @wemeetagain, let's get his thoughts first

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea is to always match the ssz type by deriving the ts type from it.

Yeah that was the idea. This is how its done in our types package, and my thought was that we may want to move all these types defined in the api over to our types package.

*/
getPoolProposerSlashings(): Promise<ApiClientResponse<{[HttpStatusCode.OK]: {data: phase0.ProposerSlashing[]}}>>;
getPoolProposerSlashings: Endpoint<
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

submitPoolBlsToExecutionChange(
blsToExecutionChange: capella.SignedBLSToExecutionChange[]
): Promise<ApiClientResponse<{[HttpStatusCode.OK]: void}, HttpStatusCode.BAD_REQUEST>>;
submitPoolBLSToExecutionChange: Endpoint<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should defer changing Bls to BLS from this PR and do it project wise later.

Copy link
Member Author

@nflaig nflaig May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rename is required to make the routes testable against the spec which is critical for this PR to ensure correctness.

I don't feel strongly about renaming this, it's just required here to match spec operationId.

We could open an issue to do a follow-up renaming of other parts in the code after this PR is merged.

req: {
writeReq: ({syncingStatus}) => ({query: {syncing_status: syncingStatus}}),
parseReq: ({query}) => ({syncingStatus: query.syncing_status}),
schema: {query: {syncing_status: Schema.Uint}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uint have different range but this parameter have different integer range 100-599.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as on unstable, Schema.Unit will currently be translated to {type: "integer", minimum: 0};, we can add more accurate / ranged-based schema validation for integer values but this is the only API as far as I know that requires it.

We currently already enforce that the range is valid in server implementation

if (syncingStatus != null && (syncingStatus < 100 || syncingStatus > 599)) {
throw new ApiError(400, `Invalid syncing status code: ${syncingStatus}`);

url: "/eth/v1/node/syncing",
method: "GET",
req: EmptyRequestCodec,
resp: JsonOnlyResponseCodec,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we say our API have full SSZ support then it we should not have JsonOnlyResponseCodec expect the lodestar namespace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the node namespace has several response schemas that don't cleanly map to ssz types.

Future steps would be to update the spec to better align to ssz everywhere.
If we require ALL endpoints to support SSZ before landing this feature, then we will likely never land this feature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we say our API have full SSZ support

This is not the claim of this PR, but it provides first-class support for ssz on the api. What we can claim though is that we have full ssz support on all performance relevant apis, that includes mostly beacon and validator required apis.

If we require ALL endpoints to support SSZ before landing this feature, then we will likely never land this feature.

I don't even think this is practically useful. A lot of apis are not performance sensitive or mostly consumed by clients that do not even supports ssz, think dappnode querying the peer count.

I am not saying we shouldn't push for more apis to support ssz but there are limitations to it's usefulness in most of those cases.

@nflaig
Copy link
Member Author

nflaig commented May 23, 2024

My major concern upto this point is the the lack of response codes from the routes definitions. Now use will never know any endpoint can response with 200, 206, 503, or 404.

We are putting alls calls in a promise box and saying it would be either fullfil or fail. But in case of failure user don't know what could be expected status codes.

This was a major information missing in earlier implementation and we added that in last refactor. I don't think loosing that context is a worthwhile.

We could add this by extending the Endpoint type but I honestly kinda happy that those are removed.

The reasons why I think those were not great are

  • not correctness checked against spec, e.g. getHeader claims it only returns 200 while the spec also has 204 defined (the spec also has no 404 while our types do...)
  • I have only found a single case in the code where we actually use the status code directly
  • we don't enforce those status codes anywhere to ensure implementation correctness, in current unstable, it's not even possible to set the status code from the server impl (although we can do that now)
  • the information communicated by the status code types is not very detailed, e.g. it says what status codes can be returned but not why those might be returned, making it hardly actionable without looking at the spec which contains more details

I personally don't think we should add status codes to each endpoint for reasons listed above but if we feel strongly about it could still be done.

@nflaig
Copy link
Member Author

nflaig commented May 24, 2024

Request times from validator client point of view stable (on the left) vs. feat1 (on the right)

Last 7 days, rate interval 1d

getAttesterDuties

image

getProposerDuties

image

prepareBeaconCommitteeSubnet

image

produceAttestationData

image

submitPoolAttestations

image

produceBlockV3

image

publishBlockV2

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta-breaking-change Introduces breaking changes to DB, Validator, Beacon Node, or CLI interfaces. Handle with care!
Projects
None yet
4 participants