Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GenericHDF5Reader #6356

Open
wants to merge 71 commits into
base: master
Choose a base branch
from
Open

Add GenericHDF5Reader #6356

wants to merge 71 commits into from

Conversation

gjover
Copy link

@gjover gjover commented Feb 28, 2023

Issue

Need for a generic HDF5 Reader that reads a dataset of choice from an hdf5 file.

Description of changes

Add GenericHDF5Reader
Add an options box in OWFile to allow readers to define additional options

Includes
  • Code changes
  • Tests
  • Documentation

@markotoplak
Copy link
Member

Hello @gjover, hello to Alba!

Could you also provide some differently structure hfd5s for testing? If you can not share them publicly, feel free to email them.

In the future, this addition will also need some test files that we will keep in the repository for automated testing, but those should be prepared with special care, so they are very small.

@janezd
Copy link
Contributor

janezd commented Apr 21, 2023

/rebase

@markotoplak
Copy link
Member

/rebase

@markotoplak
Copy link
Member

Thanks for the contribution. I did not test, so I only commented on the code structure.

As you saw when implementing this, the current File is reader-agnostic. We tried making it such that it does not know anything about specific formats, and we would like to keep it that way.

Therefore, any reader-specific interface should be packaged with the reader. A reader could define a variable that lists the class that defines UI for that specific reader. Then, OWFile could check whether that variable exists; if it does, it could call that UI. That UI class should return reader properties that should then be passed on to the reader class when the actual reading is done (and, of course, stored in OWFile settings).

So, for HDF5 support, any addition to OWFile should be tiny and only call what is specified in the reader class (in practice, the contents of _hdf5_tree_model should be moved and restructured). This function separation is very important to us because it helps keep the code maintainable.

@markotoplak
Copy link
Member

Also, please rebase your branch to the current orange3/master so that there are no merge commits in the PR. If you have problems with that, we can do it for you, but then you'll have to carefully synchronize your checkout-out branch.

@markotoplak
Copy link
Member

I am thinking about how to go forward. One option is that you propose how complex readers should interact with file reading widgets (in general, not OWFile specific), either in code or as a spec. Or, perhaps more efficiently, we could structure the interoperation between readers of reading widgets together, and then you could implement just the reader part following that interface while we implement the OWFile part.

@gjover
Copy link
Author

gjover commented Jan 25, 2024

Sure, I agree.

As commented before, we added a hidden options_box for these readers that may use them.
What about to check if the reader has the method _set_options_box, and call this Reader method in such a case?

In that way OWFile would keep agnostic but would allow the Reader to set the options box when needed.

@gjover
Copy link
Author

gjover commented Jan 29, 2024

I have remove the option box and use the sheet mechanism instead.

Copy link

codecov bot commented Feb 14, 2024

Codecov Report

Merging #6356 (8520bb9) into master (7bdc8e4) will decrease coverage by 0.04%.
Report is 1 commits behind head on master.
The diff coverage is 42.22%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6356      +/-   ##
==========================================
- Coverage   88.13%   88.10%   -0.04%     
==========================================
  Files         322      322              
  Lines       70577    70619      +42     
==========================================
+ Hits        62206    62218      +12     
- Misses       8371     8401      +30     

@stuart-cls
Copy link

I tried this on a few files, it seems to work well as a fallback reader. I did find it odd that the indexes of multidimensional datasets end up in features, in orange-spectroscopy we usually put this into Metas.

I squashed the history into two commits for easier reading: https://github.com/stuart-cls/orange3/tree/generic-hdf5-squash . I don't think the owfile.py changes are actually used in the latest code.

@markotoplak I don't want to hijack this PR too much, but I tried to combine this with an implementation of the Orange on-disk format from dask to see if we could have subclasses register that they are better able to handle a particular HDF5 format (we have a few of these in orange-spectroscopy). See orange-hdf5 and base-hdf5

It works, but this approach is flawed because you lose (at least) the sheets functionality on your subclasses, which is a problem for these metareaders in orange-spectroscopy as well.

@markotoplak
Copy link
Member

@stuart-cls, could you open a PR with your orange-hdf5 reader/writer? I checked it quickly, and it looks good. We do need something similar in the distro. I also like that you handle variable-length strings (I have no idea how I missed what you did when I was programming that part for Dask). Thanks!

@markotoplak
Copy link
Member

@gjover, just to double check: your changes here in the File widget are not necessary for the reader, right? If so, let's merge this reader into github.com/quasars/orange-spectroscopy for now. Could you open a PR there, please? If you don't have time, I can take your change set and open the PR myself.

@markotoplak
Copy link
Member

After we have these parts, we can also think about the meta reader that Stuart suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants