-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
contentUrl for each format of a file (original proprietary vs archival) #641
Comments
I would lean towards representing them as separate FileObjects. If we want to preserve the connection between them, we could specify sc:encoding pointing from the original file to each of the alternative formats. Would that make sense? |
@benjelloun sort of? Is there a concrete example in the examples at https://github.com/mlcommons/croissant/tree/v1.0.5/datasets/1.0 ? I looked quickly but couldn't find one. Either way, it sounds like this could be something we add later. I think most of our users will be happy with the original file. |
For some proprietary file formats such as Excel, Stata, and SPSS, Dataverse creates non-proprietary formats (TSV and RData) of uploaded files for archival purposes. Plus a TSV version might be good enough for a researcher who doesn't have the proprietary software installed.
Our Croissant output is still a work in progress but for now I'm favoring the original, proprietary format under
contentUrl
like this:However, if I wanted to advertise that non-proprietary formats (TSV and RData) are available as well, what's the best practice in Croissant?
Would each format be another FileObject? In our UI (below), we show a single file with multiple download options but maybe from the Croissant perspective these formats would be better represented as different files? They would have different checksums and sizes, after all. 🤔
The text was updated successfully, but these errors were encountered: