Updated data/utils.py and engine/validator.py #10206

Bhavay-2001 · 2024-04-21T17:37:42Z

PR for issue #9095.

It's a draft PR but I will happy to alter it and make changes. Please review it and provide suggestions to improve.
Thanks

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Enhanced dataset handling now supports JSON in addition to YAML.

📊 Key Changes

Added an extension parameter to check_det_dataset function to support JSON files.
Enhanced the dataset validation logic to accept both .yaml/.yml and .json formats for dataset descriptions.

🎯 Purpose & Impact

Flexibility: Allows users to specify their dataset descriptors in JSON format, providing more flexibility in how datasets are defined and shared. 🔄
Ease of Use: Makes it easier for those already using JSON files in their projects to integrate with Ultralytics tools without needing to convert their files to YAML. 🌐
Future Proofing: By supporting more formats, Ultralytics makes its tools more versatile and ready for future developments and community needs. 💡

Bhavay-2001 · 2024-04-21T17:39:31Z

I have read the CLA Document and I sign the CLA

codecov · 2024-04-21T17:45:42Z

Codecov Report

Attention: Patch coverage is 51.02041% with 24 lines in your changes are missing coverage. Please review.

Project coverage is 74.80%. Comparing base (28cb2f2) to head (a0ba44d).

Files	Patch %	Lines
ultralytics/data/utils.py	46.15%	14 Missing ⚠️
ultralytics/utils/__init__.py	22.22%	7 Missing ⚠️
ultralytics/engine/exporter.py	0.00%	2 Missing ⚠️
ultralytics/data/explorer/explorer.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #10206      +/-   ##
==========================================
- Coverage   77.88%   74.80%   -3.08%     
==========================================
  Files         122      122              
  Lines       15579    15607      +28     
==========================================
- Hits        12133    11675     -458     
- Misses       3446     3932     +486

Flag	Coverage Δ
Benchmarks	`35.57% <20.40%> (-0.09%)`	⬇️
GPU	`37.36% <26.53%> (-0.03%)`	⬇️
Tests	`70.16% <51.02%> (-3.52%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Bhavay-2001 · 2024-04-24T12:19:27Z

Hi @glenn-jocher, I will look into the errors and try to resolve them. However, I need to ask a few things.

You said that I have to test on multiple json datasets, soo how can I look for these datasets. Can you provide some names pls?
How to check this validation loader. I mean any sample script that I can run and it will load this validation loader and help me run the json datasets?
Any command that i can run on my local machine to view these CI errors? Like make style or smth like that?

Thanks

Bhavay-2001 · 2024-04-24T12:20:16Z

Also @glenn-jocher @Burhan-Q, pls review the PR and share your views about if it's correct or what changes can be made.
Thanks

Bhavay-2001 · 2024-04-24T16:32:32Z

Hi @glenn-jocher. My approach is that I check for the extension of the dataset and pass it in the check_det_dataset. In the implementation of the dataset, I check if the extension is .yaml or .yml then it loads data from some other file and if it is .json then load it using the json loading function.

Is this approach correct? Should I proceed with changing everywhere where check_det_dataset is called?
Thanks

glenn-jocher · 2024-04-24T16:48:09Z

Hi there! Your approach sounds solid for expanding dataset format support. 👍 Yes, you should proceed with modifying the necessary parts of the code that call check_det_dataset to accommodate the new logic for handling JSON dataset files. If you have any specific areas where you're unsure, feel free to share more details or a code snippet, and I'd be happy to take a closer look. Keep up the great work!

…1/ultralytics into add_multiple_datasets

Bhavay-2001 · 2024-04-26T09:52:20Z

Hi @glenn-jocher, please review the PR and let me know.
Thanks

Bhavay-2001 · 2024-04-26T11:30:20Z

Hi @glenn-jocher, can you please help with how to resolve this error? I am confused with this.
Thanks

glenn-jocher · 2024-04-26T21:37:35Z

Hi there! Sure, I’d be happy to help. Could you please provide a bit more detail about the error you’re encountering? A snippet of the error message or the context around when it occurs would be really helpful for diagnosing the issue. Looking forward to assisting you! 😊

Bhavay-2001 · 2024-04-27T07:17:18Z

Hi, so I am encountering issues at 2 places.

file1
In this, I was encountering an issue which I fixed by passing the extension of data d in check_det_dataset function. Please tell me if this was correct?
ultralytics\ultralytics\data\utils.py

        if self.task == "classify":
            unzip_dir = unzip_file(path)
            data = check_cls_dataset(unzip_dir)
            data["path"] = unzip_dir
        else:  # detect, segment, pose
            _, data_dir, yaml_path = self._unzip(Path(path))
            try:
                # Load YAML with checks
                data = yaml_load(yaml_path)
                data["path"] = ""  # strip path since YAML should be in dataset root for all HUB datasets
                yaml_save(yaml_path, data)
                data = check_det_dataset(yaml_path, ".yaml", autodownload)  # dict
                data["path"] = data_dir  # YAML path should be set to '' (relative) or parent (absolute)
            except Exception as e:
                raise Exception("error/HUB/dataset_stats/init") from e

In this I am encountering the issue on line data = check_det_dataset(yaml_path, ".yaml", autodownload) where the error says
UnboundLocalError: cannot access local variable 'data' where it is not associated with a value which means that data variable doesn't contain any value and is referenced before assignment.

Thanks

Bhavay-2001 · 2024-04-27T07:21:57Z

Also, please let me know how can I check for CI errors on my vscode only and only push code which is clean and error free. Any helpful commands for that?

glenn-jocher · 2024-04-27T17:43:56Z

Hi! For the changes you’ve made:

Yes, passing the extension to check_det_dataset sounds like a good approach to support different file formats. It keeps your code adaptable for both .yaml and .json.
It looks like the issue might stem from the scope where data is defined. Make sure data is defined before your try block or initialized properly within every block that might use it. If the problem persists, a snippet of how you're handling data would be very helpful!

For checking CI errors locally, you can rely on pre-commit hooks or run specific linting and testing commands based on the CI pipeline of the project. For instance, running flake8 for Python linting or pytest for tests. You might want to look into the project's CI configuration (e.g., .github/workflows/ci.yml) to see which commands are run. Also, setting up a Docker environment mirroring the CI environment can help catch errors early. 😊

At minimum, you should run pytests to verify that non of the tests fail.

Hope this helps!

…1/ultralytics into add_multiple_datasets

Bhavay-2001 · 2024-05-02T16:32:43Z

Hi @glenn-jocher, Can you please check and merge?

glenn-jocher · 2024-05-03T00:59:54Z

@Bhavay-2001 hi there! 👋 Thanks for the heads up. I'll review the changes ASAP and get back to you with any feedback or proceed with merging if everything looks good. Appreciate your patience and contribution! 😊

Bhavay-2001 · 2024-05-05T04:45:27Z

Hi @glenn-jocher, any updates on the PR?

glenn-jocher · 2024-05-05T21:40:38Z

Hi @Bhavay-2001! Thanks for checking in. I'm currently reviewing the PR and will provide feedback or approve it shortly. Hang tight! 😊 If there’s anything specific you’d like to discuss or need help with in the meantime, feel free to let me know!

Bhavay-2001 · 2024-05-09T11:06:01Z

Hi @glenn-jocher, any updates on PR?

glenn-jocher · 2024-05-09T23:29:08Z

Hi there! 👋 We're currently reviewing the PR and will provide feedback or move forward with merging very soon. Thanks for your patience! If there's anything else you'd like to discuss in the meantime, feel free to reach out. 😊

Bhavay-2001 · 2024-05-13T13:28:08Z

Hi @glenn-jocher, any updates?

glenn-jocher · 2024-05-13T20:36:56Z

@Bhavay-2001 hi there! 👋 We're actively reviewing the PR and will keep you updated. Thanks for your patience! If there’s anything specific you need help with, feel free to let me know. 😊

Bhavay-2001 and others added 3 commits April 21, 2024 23:03

Updated data/utils.py and engine/validator.py

bbfaf29

Merge branch 'main' into add_multiple_datasets

8675f88

Auto-format by https://ultralytics.com/actions

bb3c6a0

Bhavay-2001 mentioned this pull request Apr 21, 2024

Updated validator.py #9365

Closed

Bhavay-2001 and others added 3 commits April 22, 2024 21:11

Updated data/utils.py

7e8565b

Auto-format by https://ultralytics.com/actions

45895be

Merge branch 'main' into add_multiple_datasets

8ad79cb

Bhavay-2001 added 2 commits April 26, 2024 15:20

Updated multiple files where check_det_dataset was used

ba21eec

Merge branch 'add_multiple_datasets' of https://github.com/Bhavay-200…

96a4d37

…1/ultralytics into add_multiple_datasets

Bhavay-2001 and others added 3 commits April 26, 2024 16:09

Updated train_world.py file

cdd2ede

Auto-format by https://ultralytics.com/actions

bab43f5

Merge branch 'main' into add_multiple_datasets

d0bce5b

Bhavay-2001 and others added 5 commits April 28, 2024 17:33

Updated data/utils.py utils/__init__.py

c53e156

Merge branch 'add_multiple_datasets' of https://github.com/Bhavay-200…

2c23a4e

…1/ultralytics into add_multiple_datasets

Auto-format by https://ultralytics.com/actions

578411e

Merge branch 'main' into add_multiple_datasets

71a6f6b

Updated data/utils.py

c2a2045

Bhavay-2001 added 2 commits May 3, 2024 16:52

Merge branch 'main' into add_multiple_datasets

f4cb570

Merge branch 'main' into add_multiple_datasets

a0ba44d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated data/utils.py and engine/validator.py #10206

Updated data/utils.py and engine/validator.py #10206

Bhavay-2001 commented Apr 21, 2024 •

edited by github-actions bot

Bhavay-2001 commented Apr 21, 2024

codecov bot commented Apr 21, 2024 •

edited

Bhavay-2001 commented Apr 24, 2024

Bhavay-2001 commented Apr 24, 2024

Bhavay-2001 commented Apr 24, 2024

glenn-jocher commented Apr 24, 2024

Bhavay-2001 commented Apr 26, 2024

Bhavay-2001 commented Apr 26, 2024

glenn-jocher commented Apr 26, 2024

Bhavay-2001 commented Apr 27, 2024 •

edited

Bhavay-2001 commented Apr 27, 2024

glenn-jocher commented Apr 27, 2024 •

edited by Burhan-Q

Bhavay-2001 commented May 2, 2024

glenn-jocher commented May 3, 2024

Bhavay-2001 commented May 5, 2024

glenn-jocher commented May 5, 2024

Bhavay-2001 commented May 9, 2024

glenn-jocher commented May 9, 2024

Bhavay-2001 commented May 13, 2024

glenn-jocher commented May 13, 2024

Updated data/utils.py and engine/validator.py #10206

Are you sure you want to change the base?

Updated data/utils.py and engine/validator.py #10206

Conversation

Bhavay-2001 commented Apr 21, 2024 • edited by github-actions bot

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

Bhavay-2001 commented Apr 21, 2024

codecov bot commented Apr 21, 2024 • edited

Codecov Report

Bhavay-2001 commented Apr 24, 2024

Bhavay-2001 commented Apr 24, 2024

Bhavay-2001 commented Apr 24, 2024

glenn-jocher commented Apr 24, 2024

Bhavay-2001 commented Apr 26, 2024

Bhavay-2001 commented Apr 26, 2024

glenn-jocher commented Apr 26, 2024

Bhavay-2001 commented Apr 27, 2024 • edited

Bhavay-2001 commented Apr 27, 2024

glenn-jocher commented Apr 27, 2024 • edited by Burhan-Q

Bhavay-2001 commented May 2, 2024

glenn-jocher commented May 3, 2024

Bhavay-2001 commented May 5, 2024

glenn-jocher commented May 5, 2024

Bhavay-2001 commented May 9, 2024

glenn-jocher commented May 9, 2024

Bhavay-2001 commented May 13, 2024

glenn-jocher commented May 13, 2024

Bhavay-2001 commented Apr 21, 2024 •

edited by github-actions bot

codecov bot commented Apr 21, 2024 •

edited

Bhavay-2001 commented Apr 27, 2024 •

edited

glenn-jocher commented Apr 27, 2024 •

edited by Burhan-Q