Spec Improvement for the broader GraphQL Ecosystem #55

dylanowen · 2021-06-14T23:48:10Z

I'd like to reopen the discussion on @enjoylife's suggested simplification of the spec #50 . I believe their suggestion can greatly simplify this specification and ease the implementation for typesafe frameworks while still achieving the original goals for JS GraphQL libraries.

Just for clarity, this is a rough draft of how I imagine the new version of the spec could look building off of what @enjoylife has defined:

Requests are multipart/form-data

There is exactly 1 non-file form field. This contains the GraphQL request. The GraphQL server has a Scalar backed by a String that references the name of the related file form field.

It doesn't need to be named operations anymore but it might be better to reserve this name for further extension

It doesn't need to be before the file form fields but it might be best to enforce this for performance

We might not need to specify the name of our scalar but Upload seems good.

Every other form field should be a file with a unique name which will be referenced by the Upload scalar.

ex:

curl http://localhost:4000/graphql \
  -F gql='{ "query": "mutation { upload(files: [\"file_id\"]) }", "variables": null }' \
  -F [email protected]

I've implemented a prototype of this in my own Scala server and I have a very rough implementation of this for Apollo:

Prototype Apollo Attachments Gist

Regarding the comments in #11 (describing how the map field is necessary for performance), this is an implementation detail of the server and something we can solve for Apollo. If our Apollo plugin finds an Upload(file_id) we grab a promise from our shared Uploads object which we will resolve once we parse our file or reject after we finish parsing the entire request. This lets us execute our GraphQL request as soon as we find it in our form fields.

This is a trace from running my gist:

curl http://localhost:4000/graphql \
  -F gql='{ "query": "mutation { upload(files: [\"file1\", \"file2\", \"file1\"]) }", "variables": null }' \
  -F file1=@Untitled \
  -F file2=@API\ Wizard.mp4

You can see that we've achieved the same async waterfall where our GraphQL request execution starts immediately.

The first thing that comes to mind (although it's a pretty exotic) is that the current spec allows files to be used anywhere in the GraphQL operations objects, not just in variables

Yes, each file is always referenced by its uid so your server can choose to arrange its json however it desires without any issues.

An added benefit of this proposal over the current spec is the ability to define file references outside of variables. Right now you're required to always have a "variables" section to reference via your map form-field. It's not possible to send something like:

curl http://localhost:4000/graphql \
  -F gql='{ "query": "mutation { upload(files: [\"file_id\", \"file_id\"]) }", "variables": null }' \
  -F [email protected]

With your proposal that doesn't include a map of where files are used in the operations, it's not clear to me how one file variable used as an argument in multiple mutations in the same request could be implemented on the server. Is that something you have considered?

This doesn't really change between the current spec and this proposal. You're always looking up in your context for the file based on its uid. There's no reason you can't repeatedly query the same file based on its uid.

ex:

curl http://localhost:4000/graphql \
  -F gql='{ "query": "mutation { a: upload(files: [\"file_id\"]) b: upload(files: [\"file_id\"]) }", "variables": null }' \
  -F [email protected]

Performance wise the map allows the server to cheaply see up front how many files there are, and where they are expected to be used in the operation without having to parse the GraphQL query looking for certain upload scalars in an AST, etc. For example, the map makes implementing the maxFiles setting in graphql-upload trivial.

Although this is true of the new spec change, we'll always be parsing GraphQL requests in GraphQL servers, it's a matter of leveraging the server libraries to facilitate this. This is something that could maybe be handled by an Apollo validationRule or definitely by an Apollo plugin. We're writing a spec for GraphQL, we should be using the tools our GraphQL servers provide to us.

Even if we're writing an implementation of this spec for a framework that gives 0 options to validate our GraphQL request, the current JS spec implementation has already defined code that would catch maxFiles as they were streaming through via Busboy: https://github.com/jaydenseric/graphql-upload/blob/2ee7685bd990260ee0981378496a8a5b90347fff/public/processRequest.js#L67

The point of this spec is to create a standard for interoperability between all GraphQL clients and servers, regardless of the languages or ecosystems

Exactly, this spec appears to be designed in order to run as a JS Server middleware. There is a good amount of indirection, implementation specific solutions, and dependencies on the implementing language/framework. This all creates more work for server implementers.

I did an audit of the various server implementations and all of the ones I looked at either depend on:

Their language being dynamic
Their language having a top Object type and casting (ignoring type safety)
Or they basically implement the proposed spec change internally

There doesn't seem to be a good way to add this specification to a typesafe language/framework without it devolving into the proposed spec change.

For Example `async-graphql` in Rust

https://github.com/async-graphql/async-graphql/blob/f62843cbd34ef9bf28f70f8df07d4f61f8038e0a/src/request.rs#L115

*variable = Value::String(format!("#__graphql_file__:{}", self.uploads.len() - 1));

we can see that internally, after the map is parsed, we replace the null inside the variables definition with a uid to reference the specific file.

https://github.com/async-graphql/async-graphql/blob/f62843cbd34ef9bf28f70f8df07d4f61f8038e0a/src/types/upload.rs#L99

/// Get the upload value.
pub fn value(&self, ctx: &Context<'_>) -> std::io::Result<UploadValue> {
   ctx.query_env.uploads[self.0].try_clone()
}

When we get the value out of the Scalar, we pull the actual file stream out of the context via that same uid.

For Example `caliban` in Scala

https://github.com/ghostdogpr/caliban/blob/660beeae538768d817a73cb4535e0e3bd1a8cb82/adapters/play/src/main/scala/caliban/uploads/Upload.scala#L89

// If we are out of values then we are at the end of the path, so we need to replace this current node
// with a string node containing the file name
StringValue(name)

We're setting our variable value to the filename in order to pull it out of the context later.

For Example `sangria` in Scala

(This isn't actually in the library but this gist describes how to implement it).

https://gist.github.com/dashared/474dc77beb67e00ed9da82ec653a6b05#file-graphqlaction-scala-L54

(GraphQLRequest(gql = gqlData, upload = Upload(mfd.file(mappedFile._1)), request = request))

we store our uploaded file separately from our GraphQL request.

https://gist.github.com/dashared/474dc77beb67e00ed9da82ec653a6b05#file-controller-scala-L68

userContext = SangriaContext(upload, maybeUser),

we pass our file through via the context.

https://gist.github.com/dashared/474dc77beb67e00ed9da82ec653a6b05#file-exampleschema-scala-L15

val maybeEggFile = sangriaContext.ctx.maybeUpload.file.map(file => Files.readAllBytes(file.ref))

Inside of our resolver we lookup the file in the context to actually use it.

All of these examples have implemented @enjoylife's proposal under the covers in order to preserve some form of type safety.

Note:

We can use these libraries as a guide to show us how to implement supporting both the current version of the spec and the proposed change in the same server with plenty of code sharing.

For Reference `graphql-upload` in JavaScript

This JavaScript implementation depends on the language being dynamic so that we can overwrite our variables with an Upload instance.

https://github.com/jaydenseric/graphql-upload/blob/60f428bafd85b93bc36524d1893aa39501c50da1/public/processRequest.js#L232

operationsPath.set(path, map.get(fieldName));

We assign the Upload instance to the specified location in our GraphQL Json Request

https://github.com/jaydenseric/graphql-upload/blob/60f428bafd85b93bc36524d1893aa39501c50da1/public/GraphQLUpload.js#L81

if (value instanceof Upload) return value.promise;

When parsing our Scalar value, we check and cast to make sure we found an Upload instance.

For Reference `graphql-java-servlet` in Java

This Java implementation depends on the top level Object type so that we can check and cast our variables on the fly.

https://github.com/graphql-java-kickstart/graphql-java-servlet/blob/eb4dfdb5c0198adc1b4d4466c3b4ea4a77def5d1/graphql-java-servlet/src/main/java/graphql/kickstart/servlet/GraphQLMultipartInvocationInputParser.java#L138

objectPaths.forEach(objectPath -> VariableMapper.mapVariable(objectPath, variables, part));

We set each http.Part in our variable map

https://github.com/graphql-java-kickstart/graphql-java-servlet/blob/eb4dfdb5c0198adc1b4d4466c3b4ea4a77def5d1/graphql-java-servlet/src/main/java/graphql/kickstart/servlet/apollo/ApolloScalars.java#L28

if (input instanceof Part) {
  return (Part) input;

When parsing our Scalar, we check and cast to make sure we found a http.Part instance.

This spec and the graphql-upload JS server-side implementation are not tied in any way to Apollo, or a "heavy js graphql abstraction"

I can't speak for @enjoylife but I don't believe the proposed changes to this spec are implying the code for graphql-upload is heavy. graphql-upload is quite elegant in its implementation. In fact, for my Apollo prototype I borrowed heavily from graphql-upload. The heavy parts are that:

We have multiple implementation specific details baked into the specification
Using null as a placeholder is really another server implementation detail, it doesn't make sense from the client perspective
There is a lot of indirection in the variables in order to support implementing GraphQL Upload libraries inside of JS middleware

In Summary

"The point of this spec is to create a standard for interoperability between all GraphQL clients and servers, regardless of the languages or ecosystems" and the current iteration of this specification constrains non-dynamic languages in order to be written inside of a JS Server Middleware. Evolving this specification will better fit the growing GraphQL ecosystem and make this specification future proof so that everybody can benefit from the work you've done here.

The text was updated successfully, but these errors were encountered:

Erik1000 · 2021-09-02T15:20:15Z

I'd like to add that the spec currently defines that the graphql query itself as well as the mapping for the files are json encoded. Since the GraphQL spec does explicitly not define an encoding, this spec shouldn't either. An example would be that a server uses CBOR instead of json for the "normal" GraphQL queries but then has to decide if it should require the mapping for the files to be cbor too or json.

This is not great and would cause different behaviour across implementations where one expects json and the other the same encoding as everywhere else.

Erik1000 · 2021-09-06T20:05:39Z

Also why not expand the spec for responses? There are probably some cases where a external cdn is too much but just putting the stuff directly in a field in the response isn't great either.

jaydenseric · 2021-09-07T01:43:13Z

@Erik1000 regarding #55 (comment)

the [GraphQL multipart request] spec currently defines that the graphql query itself as well as the mapping for the files are json encoded. Since the GraphQL spec does explicitly not define an encoding, this spec shouldn't either.

The official GraphQL Foundation GraphQL over HTTP spec (still a draft) does in fact require servers and clients to support JSON:

Servers and clients over HTTP MUST support JSON and MAY support other, additional serialization formats.
— https://github.com/graphql/graphql-over-http/blob/main/spec/GraphQLOverHTTP.md#serialization-format

The point of the GraphQL multipart request spec is to allow all sorts of clients to send file uploads to all sorts of GraphQL APIs; if we have different versions of a spec for serialization formats other than JSON then it undermines this goal. JSON is by far the easiest to work with in browser code, where the size of your code and dependencies really matters for performance. I think it's safe to assume almost all server environments have the means to process JSON, but JSON is one of the only serialization formats clients can elegantly work with (without introducing third party library bloat).

Regarding #55 (comment)

Also why not expand the spec for responses?

This was one of the first things considered years ago during early experimentation, but I haven't met anyone that wants or needs this yet. If it's something you want to experiment with, keep in mind that there is a GraphQL over HTTP RFC for incremental delivery that specifies multipart/mixed GraphQL responses. It doesn't interfere with multipart GraphQL requests because it's for responses:

graphql/graphql-over-http#124 (comment)

But if you plan to experiment with multipart GraphQL responses for file downloads within the response, it might need to be accounted for as a user might want both file downloads as well as an incremental delivery?

Erik1000 · 2021-09-09T12:44:09Z

The official GraphQL Foundation GraphQL over HTTP spec (still a draft) does in fact require servers and clients to support JSON:

I see, didn't know about this spec. Why not write it like the spec itself? E.g. json MUST be supported but a server MAY also accept everything serialized in something else (like CBOR).

As for incremental delivery, I'll look into that, seems interesting!

dylanowen mentioned this issue Jun 14, 2021

Simple alternative if you are not tied to the JS graphql ecosystem. #50

Closed

dylanowen mentioned this issue Aug 1, 2022

Consider integrating or referencing GraphQL multipart spec graphql/graphql-over-http#7

Open

dylanowen linked a pull request May 9, 2024 that will close this issue

Version 3 #72

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec Improvement for the broader GraphQL Ecosystem #55

Spec Improvement for the broader GraphQL Ecosystem #55

dylanowen commented Jun 14, 2021

Erik1000 commented Sep 2, 2021

Erik1000 commented Sep 6, 2021

jaydenseric commented Sep 7, 2021

Erik1000 commented Sep 9, 2021

Spec Improvement for the broader GraphQL Ecosystem #55

Spec Improvement for the broader GraphQL Ecosystem #55

Comments

dylanowen commented Jun 14, 2021

For Example async-graphql in Rust

For Example caliban in Scala

For Example sangria in Scala

Note:

For Reference graphql-upload in JavaScript

For Reference graphql-java-servlet in Java

In Summary

Erik1000 commented Sep 2, 2021

Erik1000 commented Sep 6, 2021

jaydenseric commented Sep 7, 2021

Erik1000 commented Sep 9, 2021

For Example `async-graphql` in Rust

For Example `caliban` in Scala

For Example `sangria` in Scala

For Reference `graphql-upload` in JavaScript

For Reference `graphql-java-servlet` in Java