Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read gzipped files. #635

Open
ccl-core opened this issue Apr 17, 2024 · 0 comments
Open

Read gzipped files. #635

ccl-core opened this issue Apr 17, 2024 · 0 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@ccl-core
Copy link
Contributor

ccl-core commented Apr 17, 2024

Currently, we don't have a way to read gzipped files.

PR #636 introduces a hack to infer whether a file has to be opened with gzip from its name.

Currently, however, don't have a nice way to express this in the Croissant definition itself.

One option to proceed might be to add the gzip media type to the EncodingFormat attribute of a FileObject by concatenating media types using +:

{
      "@type": "cr:FileObject",
      "@id": "file-object",
      "name": "file-object",
      "contentUrl": "file/path/file.json.gz",
      "encodingFormat": "application/gzip+application/json"
    }

Another option might be to change EncodingFormat to support list of media types: "encodingFormat": ["application/gzip", "application/json"].

Alternatively, and possibly preferably, we could introduce a new cr:compression keyword for FileObjects, which could be used similarly to the compression keyword that is available for read_json and read_csv in pandas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant