Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new pragma @upload for user generated content #1475

Open
brianleroux opened this issue Jan 18, 2024 · 12 comments
Open

new pragma @upload for user generated content #1475

brianleroux opened this issue Jan 18, 2024 · 12 comments
Assignees

Comments

@brianleroux
Copy link
Member

brianleroux commented Jan 18, 2024

Currently an Architect assumes if you are building a web app you probably want/need a blob store (s3 bucket) for static assets related to web app system chrome (eg. the fonts, css, images, etc). This is represented by the pragma @static and by default, and per common convention, we deploy assets found in /public to said bucket. This has worked well, but isn't super suitable for user generated content that an app may (and may not) want their end users to upload.

Architect needs to handle file uploads for, broadly, two use cases:

  1. providing user generated content that is publicly accessible (common case would be user avatars)
  2. providing for user generated content that IS NOT publicly accessible (often uploading a document that maybe sensitive)

Uploading needs to be a one way trip, directly into an S3 bucket to thwart size limits of Lambda/API Gateway, and generally be considered 'raw' for processing into other destinations. We don't want to mix the userland upload space with assets that have been processed for a couple of reasons: security (mainly) and to avoid accidentally creating a recursive Lambda invocation. Assets should be uploaded, processed by Lambda writing results elsewhere, and removed from the upload bucket.

Proposed solution

We add a new pragma to Architect:

@app
myapp

@http
get /

@upload 
src /src/upload # this is also a default and can be omitted if the filesystem matches
endpoint /_content 
private true

Implementation

  • The existence of @upload will create bucket resource named UploadBucket and arc.services() will allow it to be discovered under upload.raw
  • running arc init will generated a Lambda handler function in src/upload by default (can override code path with src) and will receive events for all uploads
  • adding endpoint creates another new resource named PublicBucket and arc.services() will allow it to be discovered under upload.public
  • additionally, by adding endpoint this will create an API Gateway->S3 proxy (same as @static adding _static in Architect)
  • adding private true creates another new resource named PrivateBucketand arc.services() will allow it to be discovered under upload.private

Most cases would either have endpoint or private but a more sophisticated application could have both destinations for uploads (eg an app supporting user avatars and user content that isn't necessarily public).

Additionally, we will want a helper in @architect/functions for generating the upload form because it is pretty weird.

Finally, to complete the feature we'll want to support at basic S3 operations in @architect/sandbox for local testing. Could put the local PrivateBucket and PublicBucket in /tmp or equiv.

Docs notes

Uploading directly to S3 is kind of weird. It requires creating a temporary signed URL with root account credentials. We will need to make this extra clear in the docs.

Alternatives and additional context

@andybee
Copy link

andybee commented Jan 18, 2024

Having done very similar things myself via plugins, I'd hugely appreciate this becoming part of the core Architect framework. This proposal covers all of my use cases (and more).

One question I have - will this also mean Sandbox support for mocking S3 so we can test uploads locally?

@macdonst
Copy link

@brianleroux my only comment is how do we handle redirecting uploads to Public or Private buckets if both exist in the app?

@ryanbethel
Copy link

I think this approach sounds great. Those two use cases (user generated public and private) cover the majority of needs I have seen.

For additional context we have solved this a few times by reusing the static asset bucket.
The image plugin does it by processing images and writing them back into a subfolder in the static bucket. The key to that is setting an ignore pattern in @static so that the subfolder is not accidentally purged.
https://github.com/enhance-dev/arc-image-plugin
We wrote about this pattern in the Begin blog showing how to use multipart form uploads to save files to the static bucket.
https://begin.com/blog/posts/2023-02-08-upload-files-in-forms-part-1

To be clear I don't think these are good solutions. They were hacks to solve the problem within the Architect/Begin limitations. Just adding them for context.

@tbeseda
Copy link
Member

tbeseda commented Jan 18, 2024

I really like the @upload pragma name 👍

My only suggestion is considering if that may collide with another primitive that Arc would add later. As a general concept, will we wish that "upload" was available?
Personally, I can't think of any other features where the word upload would be a better descriptor than what you're proposing here 😅
And AMZN's naming isn't exactly very literal. Filtering this list with "upload" returns zero results, so that's good https://aws.amazon.com/products/

@brianleroux
Copy link
Member Author

@andybee def want to add sandbox support / thx for mentioning! Will add to the issue.

@macdonst all uploads go into the UploadBucket and end up in either/both PrivateBucket and PublicBucket / this way we can secure them separately and not run into accidentally recursively invoking Lambda (by writing to the same place as uploading).

@brianleroux
Copy link
Member Author

Oh wait sorry @macdonst realizing i just re-stated above here. Let me try to explain better! The process for landing the upload somewhere is up to you (code in src/upload) … we can't really predict how folks will want to process files so thats on them to do the writes. I could see adding helpers in architect/functions based on aws-lite/s3 for the basic CRUDL stuff.

import arc from '@architect/functions'
await arc.upload.private.list() // etc

@macdonst
Copy link

@brianleroux ah yes, thanks for the clarification. That's what I was wondering. It's up to the user to decide where the upload goes.

@filmaj
Copy link
Member

filmaj commented Jan 19, 2024

@andybee def want to add sandbox support / thx for mentioning! Will add to the issue.

@brianleroux @andybee it's been a while but when I implemented https://github.com/filmaj/arc-plugin-s3-image-bucket I had that working without too much trouble. That was a couple major versions of architect back though, and I think uses the beta plugin API (still), but certainly should be doable. IIRC there was an npm package s3rver that was a local implementation of S3 API, worked great.

So I guess the idea behind this pattern is similar to following what @static provides in that there is a mapping between the ENTIRE @upload pragma to a set of up to 3 buckets, ya? I wonder if there are any use cases where you'd want, say, 2 sets of these resources? I feel like that was kinda the limitation with @static that caused people to go down the route of writing plugins for to address various versions of blob management; would a more generic S3 bucket pragma serve these use cases just as well? How would that change things? Just thinking out loud here....

The process for landing the upload somewhere is up to you (code in src/upload) … we can't really predict how folks will want to process files so thats on them to do the writes.

Actually I think this answers my musings above and differentiates this @upload pragma from @static. @static's behaviour is pre-determined somewhat based on ASAP, whereas here the behaviour is purely userland. Got it. I like that a lot! It puts more burden on writing docs / examples / resources better to get that across and provide some reference points for devs but clearly marking this as userland territory feels like the right call.

@filmaj
Copy link
Member

filmaj commented Jan 19, 2024

Looking over my old direct-to-s3 user upload plugin, I recall one thing I struggled with on that plugin was just exposing the plethora of S3 bucket options in one form or another via arc file (CORS and Lambda notifications on bucket events being just two of those I implemented in that plugin). You can see my attempt at that in the plugin's usage instructions.

@andybee
Copy link

andybee commented Jan 19, 2024

Just a quick warning RE: S3rver. The project appears to abandoned and I've had problems recently with signed URL uploads from the v3 AWS SDK (but, interestingly, not v2). Objects would upload, but using the default S3rver/S3rver credentials I was unable to GetObject the file back again. There weren't any issues once deployed to AWS, nor using something like Localstack in place of S3rver. Not sure what the best approach might be here, to also adopt S3rver (as with Dynalite), to at least ensure it's stability moving forward?

@brianleroux
Copy link
Member Author

good questions @filmaj, was chatting w Block about this a bit and we def think while this is pretty high level and specific use case(s) around uploading it won't exclude a possible lower level primitive in the future for doing all the things S3 can do. (@buckets pragma perhaps).

and yeah, this is def a lot more burden on docs than code esp around security considerations wrt to the upload form action url signing business. its not…terrible…but it is pretty weird [1].

[1] https://github.com/brianleroux/enhance-example-s3-upload/blob/main/app/elements/s3-form.mjs#L8-L41

@sjorsrijsdam
Copy link

Love this. I recently abandoned an approach for uploading files in a CMS. Mainly because it became to much of a hassle. I had to string the s3rver plugin and the plugin-storage-public plugin together. The former needed some wrangling because it didn't really play nice with the v3 AWS SDK out of the box. Then when trying to port it to aws-lite I discovered that the aws-lite doesn't seem to support the forcePathStyle option that is needed for s3rver and which is deprecated by AWS anyway. So, I just ended up yeeting everything in the public directory and call that good enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

8 participants