Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weโ€™ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated data/utils.py and engine/validator.py #10206

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

Bhavay-2001
Copy link

@Bhavay-2001 Bhavay-2001 commented Apr 21, 2024

PR for issue #9095.

Reviewers - @glenn-jocher @Burhan-Q

It's a draft PR but I will happy to alter it and make changes. Please review it and provide suggestions to improve.
Thanks

๐Ÿ› ๏ธ PR Summary

Made with โค๏ธ by Ultralytics Actions

๐ŸŒŸ Summary

Enhanced dataset handling now supports JSON in addition to YAML.

๐Ÿ“Š Key Changes

  • Added an extension parameter to check_det_dataset function to support JSON files.
  • Enhanced the dataset validation logic to accept both .yaml/.yml and .json formats for dataset descriptions.

๐ŸŽฏ Purpose & Impact

  • Flexibility: Allows users to specify their dataset descriptors in JSON format, providing more flexibility in how datasets are defined and shared. ๐Ÿ”„
  • Ease of Use: Makes it easier for those already using JSON files in their projects to integrate with Ultralytics tools without needing to convert their files to YAML. ๐ŸŒ
  • Future Proofing: By supporting more formats, Ultralytics makes its tools more versatile and ready for future developments and community needs. ๐Ÿ’ก

@Bhavay-2001
Copy link
Author

I have read the CLA Document and I sign the CLA

Copy link

codecov bot commented Apr 21, 2024

Codecov Report

Attention: Patch coverage is 51.02041% with 24 lines in your changes are missing coverage. Please review.

Project coverage is 74.80%. Comparing base (28cb2f2) to head (a0ba44d).

Files Patch % Lines
ultralytics/data/utils.py 46.15% 14 Missing โš ๏ธ
ultralytics/utils/__init__.py 22.22% 7 Missing โš ๏ธ
ultralytics/engine/exporter.py 0.00% 2 Missing โš ๏ธ
ultralytics/data/explorer/explorer.py 0.00% 1 Missing โš ๏ธ
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10206      +/-   ##
==========================================
- Coverage   77.88%   74.80%   -3.08%     
==========================================
  Files         122      122              
  Lines       15579    15607      +28     
==========================================
- Hits        12133    11675     -458     
- Misses       3446     3932     +486     
Flag Coverage ฮ”
Benchmarks 35.57% <20.40%> (-0.09%) โฌ‡๏ธ
GPU 37.36% <26.53%> (-0.03%) โฌ‡๏ธ
Tests 70.16% <51.02%> (-3.52%) โฌ‡๏ธ

Flags with carried forward coverage won't be shown. Click here to find out more.

โ˜” View full report in Codecov by Sentry.
๐Ÿ“ข Have feedback on the report? Share it here.

@Bhavay-2001
Copy link
Author

Hi @glenn-jocher, I will look into the errors and try to resolve them. However, I need to ask a few things.

  1. You said that I have to test on multiple json datasets, soo how can I look for these datasets. Can you provide some names pls?
  2. How to check this validation loader. I mean any sample script that I can run and it will load this validation loader and help me run the json datasets?
  3. Any command that i can run on my local machine to view these CI errors? Like make style or smth like that?

Thanks

@Bhavay-2001
Copy link
Author

Also @glenn-jocher @Burhan-Q, pls review the PR and share your views about if it's correct or what changes can be made.
Thanks

@Bhavay-2001
Copy link
Author

Hi @glenn-jocher. My approach is that I check for the extension of the dataset and pass it in the check_det_dataset. In the implementation of the dataset, I check if the extension is .yaml or .yml then it loads data from some other file and if it is .json then load it using the json loading function.

Is this approach correct? Should I proceed with changing everywhere where check_det_dataset is called?
Thanks

@glenn-jocher
Copy link
Member

Hi there! Your approach sounds solid for expanding dataset format support. ๐Ÿ‘ Yes, you should proceed with modifying the necessary parts of the code that call check_det_dataset to accommodate the new logic for handling JSON dataset files. If you have any specific areas where you're unsure, feel free to share more details or a code snippet, and I'd be happy to take a closer look. Keep up the great work!

@Bhavay-2001
Copy link
Author

Hi @glenn-jocher, please review the PR and let me know.
Thanks

@Bhavay-2001
Copy link
Author

Hi @glenn-jocher, can you please help with how to resolve this error? I am confused with this.
Thanks

@glenn-jocher
Copy link
Member

Hi there! Sure, Iโ€™d be happy to help. Could you please provide a bit more detail about the error youโ€™re encountering? A snippet of the error message or the context around when it occurs would be really helpful for diagnosing the issue. Looking forward to assisting you! ๐Ÿ˜Š

@Bhavay-2001
Copy link
Author

Bhavay-2001 commented Apr 27, 2024

Hi, so I am encountering issues at 2 places.

  1. file1
    In this, I was encountering an issue which I fixed by passing the extension of data d in check_det_dataset function. Please tell me if this was correct?

  2. ultralytics\ultralytics\data\utils.py

        if self.task == "classify":
            unzip_dir = unzip_file(path)
            data = check_cls_dataset(unzip_dir)
            data["path"] = unzip_dir
        else:  # detect, segment, pose
            _, data_dir, yaml_path = self._unzip(Path(path))
            try:
                # Load YAML with checks
                data = yaml_load(yaml_path)
                data["path"] = ""  # strip path since YAML should be in dataset root for all HUB datasets
                yaml_save(yaml_path, data)
                data = check_det_dataset(yaml_path, ".yaml", autodownload)  # dict
                data["path"] = data_dir  # YAML path should be set to '' (relative) or parent (absolute)
            except Exception as e:
                raise Exception("error/HUB/dataset_stats/init") from e

In this I am encountering the issue on line data = check_det_dataset(yaml_path, ".yaml", autodownload) where the error says
UnboundLocalError: cannot access local variable 'data' where it is not associated with a value which means that data variable doesn't contain any value and is referenced before assignment.

Thanks

@Bhavay-2001
Copy link
Author

Also, please let me know how can I check for CI errors on my vscode only and only push code which is clean and error free. Any helpful commands for that?

@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 27, 2024

Hi! For the changes youโ€™ve made:

  1. Yes, passing the extension to check_det_dataset sounds like a good approach to support different file formats. It keeps your code adaptable for both .yaml and .json.

  2. It looks like the issue might stem from the scope where data is defined. Make sure data is defined before your try block or initialized properly within every block that might use it. If the problem persists, a snippet of how you're handling data would be very helpful!

For checking CI errors locally, you can rely on pre-commit hooks or run specific linting and testing commands based on the CI pipeline of the project. For instance, running flake8 for Python linting or pytest for tests. You might want to look into the project's CI configuration (e.g., .github/workflows/ci.yml) to see which commands are run. Also, setting up a Docker environment mirroring the CI environment can help catch errors early. ๐Ÿ˜Š

At minimum, you should run pytests to verify that non of the tests fail.

Hope this helps!

@Bhavay-2001
Copy link
Author

Hi @glenn-jocher, Can you please check and merge?

@glenn-jocher
Copy link
Member

@Bhavay-2001 hi there! ๐Ÿ‘‹ Thanks for the heads up. I'll review the changes ASAP and get back to you with any feedback or proceed with merging if everything looks good. Appreciate your patience and contribution! ๐Ÿ˜Š

@Bhavay-2001
Copy link
Author

Hi @glenn-jocher, any updates on the PR?

@glenn-jocher
Copy link
Member

Hi @Bhavay-2001! Thanks for checking in. I'm currently reviewing the PR and will provide feedback or approve it shortly. Hang tight! ๐Ÿ˜Š If thereโ€™s anything specific youโ€™d like to discuss or need help with in the meantime, feel free to let me know!

@Bhavay-2001
Copy link
Author

Hi @glenn-jocher, any updates on PR?

@glenn-jocher
Copy link
Member

Hi there! ๐Ÿ‘‹ We're currently reviewing the PR and will provide feedback or move forward with merging very soon. Thanks for your patience! If there's anything else you'd like to discuss in the meantime, feel free to reach out. ๐Ÿ˜Š

@Bhavay-2001
Copy link
Author

Hi @glenn-jocher, any updates?

@glenn-jocher
Copy link
Member

@Bhavay-2001 hi there! ๐Ÿ‘‹ We're actively reviewing the PR and will keep you updated. Thanks for your patience! If thereโ€™s anything specific you need help with, feel free to let me know. ๐Ÿ˜Š

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants