Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically load "datapackage.json" when supplied a path to a folder #157

Open
khusmann opened this issue Oct 10, 2023 · 4 comments
Open
Labels
enhancement New feature or request
Milestone

Comments

@khusmann
Copy link
Contributor

A lot of frictionless implementations automatically load datapackage.json when reading a directory. It'd be nice to go

read_package("/path/to/packagedir")

instead of always

read_package("/path/to/packagedir/datapackage.json")
@peterdesmet peterdesmet added the enhancement New feature or request label Oct 10, 2023
@peterdesmet
Copy link
Member

Would be useful, but it is a bit more complex than:

  • Check if the provided path (or URL) ends with datapackage.json
  • If not, append datapackage.json with file.path()
  • Return error if that file could not be found

datapackage.yml and datapackage.yaml files are also valid, so we need to check if the provided file path has this. If not, we'll have to assume a datapackage.json file, potentially missing a datapackage.yml file. Maybe we could check for those files as well before reporting an error.

I think that is why I initially chose the verbose approach, especially since tab completion when writing the path immediately provides feedback on whether a file is present.

@peterdesmet
Copy link
Member

I'm curious to see how other frictionless software tackles this.

@khusmann
Copy link
Contributor Author

Ah, makes sense! I was wondering about that yaml stuff -- I saw yaml export functions in frictionless-py but so far have not seen a datapackage.yaml in the wild, so I was running with the assumption that the datapackage.json was the defacto standard.

The collections in the datahub were initially confusing to me to get working with frictionless-r because there was no direct link to the datapackage.json in their file listing (here, for example). The default behavior of their data-cli tool points to the root URL of the package though, and I found adding /datapackage.json did the trick.

I think it's nice (for new users especially) to be able to treat the datapackage as a sort of opaque blob they can load resources from (like tabs in an excel file), without needing to think about the internal structure -- it also facilitates distributing packages as self-contained zip files.

@peterdesmet
Copy link
Member

@khusmann Thanks for investigating. datapackage.json is the only valid format according to the specs, but yml/yaml is supported by frictionless-py and it was requested and implemented as a feature for frictionless-r. I think for guessing a file, it's fine to follow frictionless-py (and the specs) and only look for a datapackage.json. I'll try to get your PR included in the next version.

@peterdesmet peterdesmet added this to the 1.1.0 milestone Oct 27, 2023
@peterdesmet peterdesmet modified the milestones: 1.1.0, 1.2.0 Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants