Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ultralytics 8.2.14 add task + OBB to hub.check_dataset() #12573

Merged
merged 13 commits into from May 12, 2024

Conversation

Burhan-Q
Copy link
Member

@Burhan-Q Burhan-Q commented May 11, 2024

Summary

Proposed change is to make both arguments mandatory when using check_dataset since currently there are default values for both path and task. This makes it possible to run function without providing values for either argument, which throws an error that is likely to be difficult to interpret.

Additionally, given the default value for the task argument, the check_dataset function will pass a dataset even if it's the incorrect task, and is likely to cause problems when user attempts to upload the dataset to HUB; see ultralytics/hub#681. This change also adds type hinting for the function arguments, return value, and updates the function docstring per the changes proposed plus includes missing obb task from the supported tasks.

Current behavior without specifying task, check passes as "detect" task on pose dataset

from ultralytics.hub import check_dataset

check_dataset(r"Q:\datasets\coco8-pose.zip") # uses default task="detect" value

>>> Starting HUB dataset checks for Q:\datasets\coco8-pose.zip....

>>> WARNING ⚠️ Skipping Q:\datasets\coco8-pose.zip unzip as destination directory Q:\datasets\coco8-pose is not empty.

>>> Scanning Q:\datasets\coco8-pose\labels\train.cache... 4 images, 0 backgrounds, 0 corrupt: 100%|██████████| 4/4 [00:00<?, ?it/s]
>>> Statistics: 100%|██████████| 4/4 [00:00<?, ?it/s]
>>> Scanning Q:\datasets\coco8-pose\labels\val.cache... 4 images, 0 backgrounds, 0 corrupt: 100%|██████████| 4/4 [00:00<?, ?it/s]
>>> Statistics: 100%|██████████| 4/4 [00:00<?, ?it/s]

>>> Checks completed correctly ✅. Upload this dataset to https://hub.ultralytics.com/datasets/.

Current behavior with no arguments provided, ambiguous error shown

from ultralytics.hub import check_dataset

check_dataset()

>>> Starting HUB dataset checks for Q:\ML_dev\yolov8.1\ultralytics....
>>> Traceback (most recent call last):
>>>   File "Q:\ML_dev\yolov8.1\ultralytics\ultralytics\data\utils.py", line 465, in __init__
>>>     data = yaml_load(yaml_path)
            ^^^^^^^^^^^^^^^^^^^^
>>>     assert Path(file).suffix in {".yaml", ".yml"}, f"Attempting to load non-YAML file {file} with yaml_load()"
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> AssertionError: Attempting to load non-YAML file Q:\ML_dev\yolov8.1\ultralytics with yaml_load()

>>> The above exception was the direct cause of the following exception:

>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>
>>>   File "Q:\ML_dev\yolov8.1\ultralytics\ultralytics\hub\__init__.py", line 127, in check_dataset
>>>     HUBDatasetStats(path=path, task=task).get_json()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>   File "Q:\ML_dev\yolov8.1\ultralytics\ultralytics\data\utils.py", line 471, in __init__
>>>     raise Exception("error/HUB/dataset_stats/init") from e
>>> Exception: error/HUB/dataset_stats/init

Proposed changes behavior

from ultralytics.hub import check_dataset

# Error when called  without task argument
check_dataset(r"Q:\datasets\coco8-pose.zip")

>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>

>>> TypeError: check_dataset() missing 1 required positional argument: 'task'


# Error when called without arguments
check_dataset()
>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>

>>> TypeError: check_dataset() missing 2 required positional arguments: 'path' and 'task'

Additional work needed

Fundamentally, the underlying process might need to address what task the dataset actually is versus what is passed to the task argument. However, since that would require much more time, this seemed to be a simpler solution to avoid the issue, since users are less likely to provide the incorrect argument, which would end up costing them more time.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Improved type enforcement and task range in check_dataset function.

📊 Key Changes

  • Mandatory parameters: The path and task parameters in the check_dataset function are no longer optional. Users must specify these when calling the function.
  • Expanded task options: Added 'obb' (oriented bounding box) to the list of tasks that a dataset can be associated with, alongside 'detect', 'segment', 'pose', and 'classify'.

🎯 Purpose & Impact

  • Increased Clarity: By enforcing parameter types, the function usage becomes clearer and less error-prone for developers, enhancing code quality.
  • Extended Functionality: Including 'obb' as a new task option widens the usability scope of the check_dataset function, allowing for more diversified dataset uploads to the Ultralytics HUB. This update potentially benefits researchers and developers in fields requiring oriented bounding box data, such as satellite imagery analysis and certain types of object detection.

👩‍💻 For developers, these changes mean more robust and versatile tools at their disposal.
🌍 For users, expect smoother experiences and support for more dataset types in applications powered by Ultralytics technology.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Enhancements in model visualization, dataset tasks extension, and minor version bump.

📊 Key Changes

  • Added the TASKS list and expanded the use of TASKS for better clarity and modularity.
  • Introduced a new testing function for model predictions with visualize=True.
  • Expanded dataset tasks to include 'obb' for Oriented Bounding Box datasets.
  • Updated feature visualization to accommodate additional model heads.
  • Minor version update from 8.2.13 to 8.2.14.

🎯 Purpose & Impact

  • Enhanced Testing & Visualization: New tests ensure that models can visualize predictions effectively, making debugging and model evaluation easier. 🖼️
  • Broader Dataset Task Support: The addition of 'obb' (Oriented Bounding Box) support allows for more diverse datasets to be utilized, expanding the toolkit's applicability in object detection tasks. 📈
  • Improved Feature Visualization: Expansion in the types of model heads supported for feature visualization aids in a deeper understanding of how different models process images. 🧠
  • Streamlined Codebase: Better organization and minor enhancements promote a smoother development experience and potentially more stable feature releases. 🛠️

@Burhan-Q Burhan-Q added enhancement New feature or request HUB Ultralytics HUB issues labels May 11, 2024
Copy link

codecov bot commented May 11, 2024

Codecov Report

Attention: Patch coverage is 95.23810% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 70.65%. Comparing base (cf24349) to head (90a2c83).

Files Patch % Lines
ultralytics/data/utils.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #12573      +/-   ##
==========================================
+ Coverage   70.51%   70.65%   +0.13%     
==========================================
  Files         122      122              
  Lines       15621    15622       +1     
==========================================
+ Hits        11015    11037      +22     
+ Misses       4606     4585      -21     
Flag Coverage Δ
Benchmarks 35.55% <9.52%> (-0.01%) ⬇️
GPU 37.29% <9.52%> (-0.01%) ⬇️
Tests 66.75% <95.23%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Burhan-Q Burhan-Q changed the title check_dataset arguments required, updated docstring and added type hints Change HUB check_dataset to use required arguments May 11, 2024
@glenn-jocher
Copy link
Member

@Burhan-Q this looks good! Can you add examples for checking OBB and Classify datasets too to the docstring?

@glenn-jocher
Copy link
Member

I think we should be testing these in the tests, let me see. Ok OBB was missing so I've added it in 869606c

@sergiuwaxmann @Burhan-Q guys BTW it seems logical to add an upload=False argument here to automatically upload their datasets programmatically if tests pass and they are logged in to HUB right?

@Burhan-Q
Copy link
Member Author

Looks like the CI test fails on OBB since the task is not included in the HubDatasetStats class (for the _round function defined in the get_json method). I wanted to make a small change here to the check_dataset function arguments, as I expected that there would be more to sort out with fixing/updating the class. I didn't realize the OBB that was missing from the HubDatasetStats class, and expect that will have to get added.

Adding an upload argument might not be a bad idea, but maybe it's something to scope for a feature add? I also noticed that there are other strange behaviors when running this function, but primarily was looking to address one that would probably be the biggest one to address up front. If it's preferred to try to address all the issues at once, I will work on digging into this starting Monday.

@glenn-jocher
Copy link
Member

@Burhan-Q @sergiuwaxmann oh perfect, then we've caught a bug in OBB checks here. We should fix this in this PR.

@glenn-jocher
Copy link
Member

@Burhan-Q @sergiuwaxmann wait now I'm really confused, because if check_dataset doesn't work at all for OBB, then how do we have OBB datasets in HUB? Where did the JSON files with the labels come from if this step is failing??

Screenshot 2024-05-13 at 00 47 44

@glenn-jocher glenn-jocher changed the title Change HUB check_dataset to use required arguments ultralytics 8.2.13 add task and OBB to hub.check_dataset() May 12, 2024
@glenn-jocher glenn-jocher changed the title ultralytics 8.2.13 add task and OBB to hub.check_dataset() ultralytics 8.2.14 add task and OBB to hub.check_dataset() May 12, 2024
@glenn-jocher glenn-jocher changed the title ultralytics 8.2.14 add task and OBB to hub.check_dataset() ultralytics 8.2.14 add task + OBB to hub.check_dataset() May 12, 2024
@glenn-jocher glenn-jocher merged commit fd748e3 into main May 12, 2024
16 checks passed
@glenn-jocher glenn-jocher deleted the hub_data_check branch May 12, 2024 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request HUB Ultralytics HUB issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

V9e Instance Segmentation Predict feature map activation error --visualize=True error
3 participants