Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to define optional file to fetch #3190

Open
skoudoro opened this issue Apr 16, 2024 · 3 comments
Open

Allow to define optional file to fetch #3190

skoudoro opened this issue Apr 16, 2024 · 3 comments

Comments

@skoudoro
Copy link
Member

Currently, _make_fetcher do not allow optional files.

it would be great to have an option that allow to fetch all data or a part of the data.

For example, in the disco data, the noisy images are optional. This could reduced drastically the download time for benchmarking or for the unit tests

@arokem
Copy link
Contributor

arokem commented Apr 16, 2024

Might be simpler to make these two separate fetchers?

@itellaetxe
Copy link
Contributor

itellaetxe commented May 6, 2024

Hi @arokem @skoudoro

I am trying to get familiar with the codebase and saw this issue.

Maybe it would be nice to add an argument called optional_data: bool to _make_fetcher. In case optional_data is True the optional files are also downloaded.
To know which files are optional, we could use a dictionary for the remote_fnames and the local_fnames with two keys: "necessary_files" and "optional_files". Inside each key, we could include the necessary and optional files in separate lists. What do you think about it? dict is easily iterable, and we still have the lists of filenames inside it. In case the optional_data is True, we just join the two lists together and download everything.

This option has the downside of having to identify the non-optional and optional data for each fetcher. It is not as custom as directly specifying what data files you don't need, but I think this last option would be too much customization and it would probably lead to confusion (at least I would get pretty lost).

Edit: After thinking more carefully about it, I also think that making the separate fetchers as @arokem suggests is a fast/easy solution.

@skoudoro
Copy link
Member Author

skoudoro commented May 7, 2024

Hi @itellaetxe,

Maybe it would be nice to add an argument called optional_data: bool to _make_fetcher

Yes, I was going to do this solution. I think this is the quickest and the easiest. optional_data could be a bool or a list of bool (one value per file). the default value would be None.

To know which files are optional, we could use a dictionary for the remote_fnames and the local_fnames with two keys: "necessary_files" and "optional_files". Inside each key, we could include the necessary and optional files in separate lists. What do you think about it? dict is easily iterable, and we still have the lists of filenames inside it. In case the optional_data is True, we just join the two lists together and download everything.

No need for a dict. let's keep the format.

Edit: After thinking more carefully about it, I also think that making the separate fetchers as @arokem suggests is a fast/easy solution.

Personally, I would like to avoid mutiple fectcher for maintenance purpose.

Let us know if you start this work to avoid to duplicate the effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants