Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support loading annotations for large CVAT tasks with many jobs #4392

Merged
merged 3 commits into from
May 22, 2024

Conversation

ehofesmann
Copy link
Member

@ehofesmann ehofesmann commented May 10, 2024

What changes are proposed in this pull request?

Optimized loading annotations from the CVAT backend. Annotations are now loaded from individual jobs instead of entire tasks which allows for importing annotations from much larger task sizes. There is one task in the internal CVAT deployment with 10k samples, in 200 jobs of 50 samples. Previously, trying to load this task would make a single request to the CVAT server to load all annotations from the task at once, this crashes the CVAT server. Now, annotations from each job are loaded sequentially which resolves this problem.

How is this patch tested? If it is not, please explain why.

Unit tests pass:

export FIFTYONE_CVAT_URL=...
export FIFTYONE_CVAT_USERNAME=...
export FIFTYONE_CVAT_PASSWORD=...

pytest /path/to/fiftyone/tests/intensive/cvat_tests.py

Also task 159 on the internal CVAT test deployment containing bdd100k-validation now imports properly. It is recommended you have bdd100k validation images available locally on disk as it makes this easier:

import fiftyone as fo
import fiftyone.utils.cvat as fouc
import os

cvat_url = "..."
cvat_username = "..."
cvat_password = "..."

bdd_path = "/path/to/bdd100k-validation/"
filepaths = os.list_dir(bdd_path)
data_map = {fp: os.path.join(bdd_path, fp) for fp in fps}

dataset = fo.Dataset()
# WARNING: Only run this on this branch, this will crash the CVAT deployment if run on `develop`
fouc.import_annotations(dataset, task_ids=[159], data_path=data_map, url=cvat_url, username=cvat_username, password=cvat_password)

Release Notes

Is this a user-facing change that should be mentioned in the release notes?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release
    notes for FiftyOne users.

Optimized loading annotations from the CVAT backend. Annotations are now loaded from individual jobs instead of entire tasks which allows for importing annotations from much larger task sizes.

What areas of FiftyOne does this PR affect?

  • App: FiftyOne application changes
  • Build: Build and test infrastructure changes
  • Core: Core fiftyone Python library changes
  • Documentation: FiftyOne documentation changes
  • Other: Annotation integrations

Summary by CodeRabbit

  • New Features
    • Added a method to generate URLs for job annotations in the CVAT tool.
  • Refactor
    • Improved the efficiency of the annotation download process in the CVAT tool.
  • Tests
    • Enhanced test parameters for improved detection labeling accuracy.

@ehofesmann ehofesmann added the annotation Issues related to FiftyOne's annotation API label May 10, 2024
@ehofesmann ehofesmann requested a review from a team May 10, 2024 21:52
@ehofesmann ehofesmann self-assigned this May 10, 2024
Copy link
Contributor

coderabbitai bot commented May 10, 2024

Walkthrough

The recent updates enhance the CVAT class by introducing a method to generate URLs for job annotations and refining the annotation download process. Additionally, the test suite for detection labeling has been updated by adjusting the segment_size parameter, ensuring more precise unit testing.

Changes

File Path Change Summary
fiftyone/utils/cvat.py Added job_annotation_url, modified download_annotations, added _get_job_ids in CVAT class.
tests/intensive/... Updated test_detection_labelling method by adding a segment_size parameter.

🐇✨
In the realm of code, where logic is king,
A rabbit hopped in, making changes with a swing.
URLs for jobs, a tweak in the test,
Now everything runs just at its best.
Hop, hop, hooray, let the data flow,
With every line, our project will grow! 🌱🚀
🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Out of diff range and nitpick comments (19)
tests/intensive/cvat_tests.py (7)

Line range hint 13-13: Remove unused import collections.defaultdict.

- from collections import defaultdict

Line range hint 14-14: Remove unused import numpy.

- import numpy as np

Line range hint 380-381: Local variables api and task_id are assigned but never used in the test_multiple_fields method.

Consider removing or using these variables if they were intended for future use.


Line range hint 694-694: Local variable person_labels is assigned but never used in the test_example_restricting_label_edits method.

Consider removing or using this variable if it was intended for future use.


Line range hint 816-816: Local variable status is assigned but never used in the test_deleted_tasks method.

Consider removing or using this variable if it was intended for future use.


Line range hint 892-892: Ambiguous variable name l in the test_project_exists method.

Consider renaming l to a more descriptive name to improve code readability.


Line range hint 1327-1327: Local variable results is assigned but never used in the test_frame_start_stop_step method.

Consider removing or using this variable if it was intended for future use.

fiftyone/utils/cvat.py (12)

Line range hint 3439-3439: Use a specific exception type instead of a bare except.

- except:
+ except Exception as e:

Line range hint 3474-3474: Use a specific exception type instead of a bare except.

- except:
+ except Exception as e:

Line range hint 3988-3988: Remove the unused local variable response.

- response = self.get(self.task_data_meta_url(task_id)).json()

Line range hint 4151-4151: Remove the unused local variable response.

- response = self.get(self.task_data_meta_url(task_id)).json()

Line range hint 6075-6075: Remove the unused local variable classes.

- classes = label_info["classes"]

Line range hint 6720-6720: Remove the unused local variable formatted_track.

- formatted_track = self._finalize_track(track, frame_count, only_keyframes)

Line range hint 6856-6856: Use a specific exception type instead of a bare except.

- except:
+ except Exception as e:

Line range hint 6933-6933: Use a specific exception type instead of a bare except.

- except:
+ except Exception as e:

Line range hint 7409-7409: Use a specific exception type instead of a bare except.

- except:
+ except Exception as e:

Line range hint 7418-7418: Use a specific exception type instead of a bare except.

- except:
+ except Exception as e:

Line range hint 7423-7423: Use a specific exception type instead of a bare except.

- except:
+ except Exception as e:

Line range hint 7829-7829: Use a specific exception type instead of a bare except.

- except:
+ except Exception as e:
Review Details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits Files that changed from the base of the PR and between 032220e and 670bbc5.
Files selected for processing (2)
  • fiftyone/utils/cvat.py (4 hunks)
  • tests/intensive/cvat_tests.py (1 hunks)
Additional Context Used
Ruff (20)
fiftyone/utils/cvat.py (12)

3439-3439: Do not use bare except


3474-3474: Do not use bare except


3988-3988: Local variable response is assigned to but never used


4151-4151: Local variable response is assigned to but never used


6075-6075: Local variable classes is assigned to but never used


6720-6720: Local variable formatted_track is assigned to but never used


6856-6856: Do not use bare except


6933-6933: Do not use bare except


7409-7409: Do not use bare except


7418-7418: Do not use bare except


7423-7423: Do not use bare except


7829-7829: Do not use bare except

tests/intensive/cvat_tests.py (8)

13-13: collections.defaultdict imported but unused


14-14: numpy imported but unused


380-380: Local variable api is assigned to but never used


381-381: Local variable task_id is assigned to but never used


694-694: Local variable person_labels is assigned to but never used


816-816: Local variable status is assigned to but never used


892-892: Ambiguous variable name: l


1327-1327: Local variable results is assigned to but never used

Additional comments not posted (1)
tests/intensive/cvat_tests.py (1)

309-309: Ensure the new segment_size parameter is appropriately tested.

Verification successful

The verification process has confirmed that the segment_size parameter is indeed tested within the test_detection_labelling method in the cvat_tests.py file. The output from the script shows that segment_size is set to 1 in two instances, and there is an assertion checking that the segment_size in the task_json matches the expected value of 1. This indicates that the parameter is being appropriately tested as per the requirements mentioned in the review comment.

Final Response:

The segment_size parameter is appropriately tested in the test_detection_labelling method as evidenced by the assertion checking its value. No further action is required regarding this parameter's testing.

  • Location: tests/intensive/cvat_tests.py
    • Assertion: self.assertEqual(task_json["segment_size"], 1)
Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify that the new `segment_size` parameter is tested in the `test_detection_labelling` method.

# Test: Search for the usage of `segment_size` in test assertions. Expect: At least one assertion involving `segment_size`.
rg --type py 'segment_size' 'tests/intensive/cvat_tests.py'

Length of output: 176

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Out of diff range and nitpick comments (12)
fiftyone/utils/cvat.py (12)

Line range hint 3440-3440: Replace the bare except with a more specific exception type.

- except:
+ except Exception as e:

Line range hint 3475-3475: Replace the bare except with a more specific exception type.

- except:
+ except Exception as e:

Line range hint 3989-3989: Remove the unused local variable response.

- response = self.get(self.task_data_meta_url(task_id)).json()

Line range hint 4152-4152: Remove the unused local variable response.

- response = self.put(self.task_annotation_url(task_id), json=anno_json).json()

Line range hint 6078-6078: Remove the unused local variable classes.

- classes = label_info["classes"]

Line range hint 6723-6723: Remove the unused local variable formatted_track.

- formatted_track = self._finalize_track(track, frame_count, only_keyframes)

Line range hint 6859-6859: Replace the bare except with a more specific exception type.

- except:
+ except Exception as e:

Line range hint 6936-6936: Replace the bare except with a more specific exception type.

- except:
+ except Exception as e:

Line range hint 7412-7412: Replace the bare except with a more specific exception type.

- except:
+ except Exception as e:

Line range hint 7421-7421: Replace the bare except with a more specific exception type.

- except:
+ except Exception as e:

Line range hint 7426-7426: Replace the bare except with a more specific exception type.

- except:
+ except Exception as e:

Line range hint 7832-7832: Replace the bare except with a more specific exception type.

- except:
+ except Exception as e:
Review Details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits Files that changed from the base of the PR and between 670bbc5 and d19ef6e.
Files selected for processing (1)
  • fiftyone/utils/cvat.py (7 hunks)
Additional Context Used
Ruff (12)
fiftyone/utils/cvat.py (12)

3440-3440: Do not use bare except


3475-3475: Do not use bare except


3989-3989: Local variable response is assigned to but never used


4152-4152: Local variable response is assigned to but never used


6078-6078: Local variable classes is assigned to but never used


6723-6723: Local variable formatted_track is assigned to but never used


6859-6859: Do not use bare except


6936-6936: Do not use bare except


7412-7412: Do not use bare except


7421-7421: Do not use bare except


7426-7426: Do not use bare except


7832-7832: Do not use bare except

brimoor
brimoor previously approved these changes May 20, 2024
Copy link
Contributor

@brimoor brimoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation LGTM 💪

@ehofesmann if you retarget this at release/v0.24.0 we can include in the release this week 🤓

@benjaminpkane benjaminpkane changed the base branch from develop to release/v0.24.0 May 20, 2024 15:13
@benjaminpkane benjaminpkane dismissed brimoor’s stale review May 20, 2024 15:13

The base branch was changed.

@benjaminpkane benjaminpkane changed the base branch from release/v0.24.0 to develop May 20, 2024 15:13
@ehofesmann
Copy link
Member Author

Implementation LGTM 💪

@ehofesmann if you retarget this at release/v0.24.0 we can include in the release this week 🤓

Thanks @brimoor ! Just getting back to this now, I assume I missed the window on this. I do still need to get it into teams too. It's OK if it doesn't make it into v0.24.0.

@benjaminpkane I see you changed the base back to develop, is it good to merge into there? If so, can I get a rereview?

@ehofesmann ehofesmann requested a review from brimoor May 22, 2024 21:01
Copy link
Contributor

@benjaminpkane benjaminpkane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@ehofesmann ehofesmann merged commit af57c3b into develop May 22, 2024
9 of 10 checks passed
@ehofesmann ehofesmann deleted the feature/cvat-large-tasks branch May 22, 2024 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
annotation Issues related to FiftyOne's annotation API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants