Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional Content-Location response header #159

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nigoroll
Copy link
Contributor

@nigoroll nigoroll commented Sep 6, 2020

... to inform the Client about the permanent location of completed uploads.

Continues #155

This is motivated by the following use cases:

  • TUS servers may not implement storage themselves, such that the Location returned for a creation (POST) request be only transitional.

  • TUS servers may decide to change the storage location once the upload is complete, for example to

    • implement content-based naming (where the file name is some hash over the content)
    • name content based on type
    • or select paths by size

The Content-Location header is already defined by the HTTP RFCs, so, stricly speaking, its use should be implicitly allowed and well defined. Yet, for TUS, it makes sense to allow use of this response header for additional HTTP verbs (e.g. PATCH) and it might help implementation interoperability to point it out.

@smatsson
Copy link

smatsson commented Sep 8, 2020

I have had this requested for tusdotnet several times so glad you added it to the spec!

Some thoughts from a generic lib point of view:

The Server MAY send a Content-Location header with any response once
the upload is complete to optionally indicate to the Client the
permanent storage location of the upload.

  • "With any response" is a bit ambiguous. How do I as a client know when to expect the response? Can I receive the Content-Location header from a DELETE request? I would imagine that the Content-Location header would be know once the file is completely uploaded, so it would be saved in the last PATCH request. Would it make sense to only include it in PATCH and HEAD responses?
  • "Permanent storage" - How do we define permanent? Would "... indicate to the client the location of the upload for further processing" be better?

See RFC 7231 Section
3.1.4.2
.

I like that you linked to the RFC 👍

@nigoroll
Copy link
Contributor Author

nigoroll commented Sep 8, 2020

@smatsson thank you, I have changed the wording based on your feedback.

I have now specified the methods to send a Content-Location for, but actually I am not so sure if this is better. Sure, it does not make sense for a DELETE, limiting the methods brings with it the risk that we forget to include future cases. At any rate, POST could be used for Creation With Upload, so it needs to be included.

But you are definitely right regarding the word "permanent", that was not a good choice. From the protocol perspective, we should not be concerned about the future, we are just saying "this is the new location, deal with it".

How do I as a client know when to expect the response?

This touches a fundamental question, and I think the right answer is: you can't know in the general case. The server might continue to serve the content under the upload url and I do not think the protocol should make the Content-Location response header mandatory. I guess it really depends on the specific application use case if a client should expect the header.

@smatsson
Copy link

I have now specified the methods to send a Content-Location for, but actually I am not so sure if this is better. Sure, it does not make sense for a DELETE, limiting the methods brings with it the risk that we forget to include future cases. At any rate, POST could be used for Creation With Upload, so it needs to be included.

I agree with you here, the previous wording was better. I'm also not a fan of referring to GET requests in the spec as it's out of scope for the protocol.

This touches a fundamental question, and I think the right answer is: you can't know in the general case. The server might continue to serve the content under the upload url and I do not think the protocol should make the Content-Location response header mandatory. I guess it really depends on the specific application use case if a client should expect the header.

Hmm, after taking a step back, and not to diss you proposal here, but should this really be added to the spec? It is unknown when to expect the header and it's unknown what to do with the header once received. The only way to know is to have a contract with the specific server. Since there is nothing in the spec prohibiting the server from adding whatever header they feel like to any response, this seems to fit the bill better as an app specific thing?

... to inform the Client about the permanent location of completed
uploads.

This is motivated by the following use cases:

* TUS servers may not implement storage themselves, such that the
  `Location` returned for a creation (POST) request be only
  transitional.

* TUS servers may decide to change the storage location once the upload
  is complete, for example to

  * implement content-based naming (where the file name is some hash
    over the content)

  * name content based on type

  * or select paths by size

The `Content-Location` header is already defined by the HTTP RFCs, so,
stricly speaking, its use should be implicitly allowed and well defined.
Yet, for TUS, it makes sense to allow use of this response header for
additional HTTP verbs (e.g. `PATCH`) and it might help implementation
interoperability to point it out.
@nigoroll
Copy link
Contributor Author

Re @smatsson OK, I have reverted to basically ebf5701, but avoiding to talk about permanent location.

Yes, I agree, this is merely a pointer to the appropriate place in the HTTP RFCs, and it is true that any server could, with good reason, expect a client to handle the response correctly. In reality, however, implementors hardly ever read RFCs in their entirety and #155 is proof that it even took me quite a long time to come up with a reasonable suggestion. On top, we

  • would be giving a hint to client implementors that they should be prepared to handle Content-Location and
  • having the hint in the specs could improve compatibility and help avoid competing implementations.

@smatsson
Copy link

I might be missing something but I don't really follow why this goes in the spec? It seems very domain specific to me.

The server is free to include any header at any time given the current protocol (obviously except the ones that are reserved in the protocol). The headers reserved in the protocol are all well defined when it comes to when to expect them and what to do with them when they are received. In this proposal it seems that it's unknown for the client when the header can be received and it's unknown what to do with the header once received. Is it the URL where the client can download content? Is it the URL to use when polling for e.g. video encoding progress? Is the client supposed to continue all tus operations on the new location? To answer any of these questions the client will need to check the client/server agreement (or application-specific/domain-specific protocol or whatever one would call it).

I'm not saying that I don't find the functionality useful (e.g. for your hash based content storage) but I'm not entirely sure that it's a general purpose feature that should be in the spec, given the above issues on how to interpret the header.

In your use case in #155, how would the clients use this header and for what?

@nigoroll
Copy link
Contributor Author

re @smatsson

In this proposal it seems that it's unknown for the client when the header can be received and it's unknown what to do with the header once received.

The client MAY receive the header and IF it does, it should refer to the resource via the URL given in the Content-Location: header. As this is the case only for completed uploads, this refers to requests after the TUS upload is finished, which would not use TUS (e.g. HTTP GET). If this does not become clear from the proposed wording, we might want to consider adding another sentence making it clearer.

Is it the URL to use when polling for e.g. video encoding progress?

I understand this was a rhetorical question, but the technical answer would be: As Content-Location: refers to the same resource as the upload, it could not imply something unrelated as this example.

Is the client supposed to continue all tus operations on the new location?

Again, as the Content-Location: is concerned about the specific upload, no.

To answer any of these questions the client will need to check the client/server agreement (or application-specific/domain-specific protocol or whatever one would call it).

No. If I failed to phrase the proposal well enough, I need to improve. But this definitely should be clear from the protocol text.

I'm not entirely sure that it's a general purpose feature that should be in the spec

see also #159 (comment)

Please do also keep in mind that the semantics of the Content-Location: header should already be defined by RFC7231, so if we were to be pedantic about it, a TUS client would only be HTTP compliant if it already handled the header correctly. Again, I am not saying that this expectation would be realistic by any stretch, so I think that mentioning the header in the TUS protocol is appropriate (and, in my mind, the section added is merely more than a mention). To quote the relevant bit:

For a state-changing request like PUT (Section 4.3.4) or POST (Section 4.3.3), it implies that the server's response contains the new representation of that resource, thereby distinguishing it from representations that might only report about the action (e.g., "It worked!"). This allows authoring applications to update their local copies without the need for a subsequent GET request.

In your use case in #155, how would the clients use this header and for what?

Say the Upload-URL (to POST to) to was https://server/upload, then the upload location (as returned in the Location: response header of the POST) might be https://server/upload/abcdef. Then, with the PATCH concluding the upload to http://server/upload/abcdef, the server might return Content-Location: https://anotherserver/1234. The client should then access the stored object under https://anotherserver/1234

More specifically to my use case, 1234 would be bucket and a content-hash (sha256) and anotherserver would be a ceph s3 service.

@rockwotj
Copy link

Hello! I'd like to hop into this discussion and note that I'd like to extend this a bit further and allow the server to allow with arbitrary metadata (See #164) once an upload is complete. I think that this would be more flexible than what this is proposing, and I'd be OK with "well" known values for common stuff like Content-Location, but in my case we're returning things like exif data and image dimensions. There can also be enough metadata that it could be a pretty large header, and I know some proxies have restrictions on header lengths that may be interesting to design around.

@Acconut
Copy link
Member

Acconut commented Sep 16, 2020

A brief request: @nigoroll, it would be nice if you could avoid force-pushing commits to your PRs as it makes it impossible for others to see the history of changes. For example, you were discussing whether the specific HTTP methods should be included in the specification but since there is only one commit in this PR, I cannot see what exactly you talked about. We squash PRs in the end anyways, so the entire PR will become one commit anyways. Hope that's understandable :)

Regarding the topic itself: As @rockwotj mentions, it may be helpful to make a more generic post-processing extension. Right now, this PR mostly talks about providing the new content location to the client, which is helpful if the server moves resources to other storage. However, I think that this is a very specific use case and since we have had many requests about dealing with post-processing in the past, it might be a good opportunity to attack now.

My idea would be that a server could tell the client an URL under which additional information about the post-processing status is available. Maybe we could reuse the Link header for that (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Link). The tus clients can then implement features to automatically fetch this status and provide it to the application. Since moving uploads to another storage is a type of post-processing, this approach could also be used to tell the client the new location. However, the mechanism is also flexible enough that it allows for other use cases, such as @rockwotj mentions.

Does this make sense?

@rockwotj
Copy link

@Acconut this sounds good to me 👍

@smatsson
Copy link

@nigoroll Thanks for the explanation! That clears it up a lot. To be honest I still don't think this should be part of the tus protocol, mainly because it is already defined in the RFC and as you said it should already be supported by the client and server. I think that @Acconut's proposal is a better fit to allow more options on post processing than just where to download the file content.

@nigoroll
Copy link
Contributor Author

re @Acconut force-pushing is a habit from other projects which works because gh keeps and logs previous commit hashes for PRs. But I will try to remember that this is not how this project handles things. FTR, up to now the versions of this PR are:

  • ebf5701 initial
  • 32ab6ba explicit method mention
  • b3e3416 back to original, but slightly different wording

The link (based on the link html element) is intended to provide just that, links. So it seems to me like a sensible choice for a post processing status URL, but I do not see how it would help with metadata, which is a property of the uploaded resource and not one of a resource where additional status information can be queried. (#164 seems to be a bit special in that @rockwotj said he had so much metadata that it would not fit in a header, so for that case I actually do agree that a link to another resource describing the upload resource could be an option).

With respect to Content-Location, I am not sure if the direction the discussion is heading to is in the spirit of HTTP. In my mind, we should aim to make use of existing standards where we can.

So, to summarize: put links in the Link header and the content-location in the Content-Location header.

@Acconut
Copy link
Member

Acconut commented Sep 18, 2020

I am open for adding a note about the Content-Location header to the specification and I will start a discussion about a separate post-processing extension in a new GitHub issue.

However, I would change the wording of this PR slightly. @nigoroll, do you mind if I add a commit?

@nigoroll
Copy link
Contributor Author

@Acconut sorry for not responding earlier, please feel free

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants