-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore board based on arrow's S3 support #530
Comments
Via @gshotwell, this much is already possible: library(pins)
board <- board_connect(server = "https://colorado.posit.co/rsc/",
account = "[email protected]",
key = Sys.getenv("COLORADO_KEY"))
pin(mtcars, board = board)
library(duckdb)
library(DBI)
con <- DBI::dbConnect(duckdb())
dbExecute(con, "INSTALL 'httpfs.duckdb_extension'")
dbGetQuery(con, "SELECT mpg FROM 'https://colorado.posit.co/rsc/content/519521d1-a6a1-45e6-a5ec-01046686f85f/data.csv'") |
This is what Hugging face does for their flat files. The way they do it is:
I think this would be a very good Connect feature because it really reduces the memory footprint of Connect assets without sacrificing much speed. |
Isn't the example above working only because that file is publicly readable? There needs to be some kind of R filesystem abstraction duckdb can use to authenticate (either arrow fs, or similar to fsspec in python, or using duckdb's httpfs for non-connect cases) I'm guessing you can use httpfs right now, but it won't support connect, since connect is not s3 compatible (only s3, gcs, etc..) |
https://arrow.apache.org/docs/r/articles/fs.html#file-systems-that-emulate-s3
The text was updated successfully, but these errors were encountered: