Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a thresholding API. #632

Open
1 of 2 tasks
GeorgePearse opened this issue Dec 1, 2023 · 14 comments
Open
1 of 2 tasks

Add a thresholding API. #632

GeorgePearse opened this issue Dec 1, 2023 · 14 comments
Labels
enhancement New feature or request

Comments

@GeorgePearse
Copy link

GeorgePearse commented Dec 1, 2023

Search before asking

  • I have searched the Supervision issues and found no similar feature requests.

Description

Create a simple API to find the best thresholds to maximise some metric (f1-score, precision, recall), given an annotated dataset and a model.

At the minute I use the below, because it's the only repo that I've found to calculate what I need, in a reasonable time frame.

https://github.com/yhsmiley/fdet-api

Use case

Anyone wanting to deploy models without manual thresholding (or viewing graphs).

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@GeorgePearse GeorgePearse added the enhancement New feature or request label Dec 1, 2023
@GeorgePearse GeorgePearse changed the title At the minute I use this slightly hacky repo in order to calculate the thresholds I want to deploy. Add a thresholding API. Dec 1, 2023
@RigvedRocks
Copy link

I'd like to help by submitting a PR

@SkalskiP
Copy link
Collaborator

Hi @GeorgePearse and @RigvedRocks 👋🏻 ! Thanks for your interest in supervision. I am sorry that I have not been responsive for the last few days. Before Christmas, I was busy with duties unrelated to supervision, and I was off for the last few days.

The idea looks interesting. @RigvedRocks could you share some initial ideas regarding implementation?

@RigvedRocks
Copy link

I was thinking of using the basic techniques used in ML such as using the roc curve or using Youden J's statistic but the above approach outlined by @GeorgePearse works for me. I guess I can collaborate with @GeorgePearse to work on this issue if he insists so.

@GeorgePearse
Copy link
Author

GeorgePearse commented Feb 1, 2024

I'd really like to do what I can to keep this ticking over, @SkalskiP do you also think it's valuable? I'm always surprised by the lack of open-source implementations for this, and assume that every company just has their own fix.

@RigvedRocks we could do something like I try to create a branch with a "workable" solution from fdet-api, but starting from the supervision format, and you take it from there? Let me know if that might interest you?

@josephofiowa also curious to hear your thoughts, I used to do it with some voxel51 code (they have a method from which you can get all of the matching predictions for a given IoU), but it was painfully slow.

I keep assuming a "good" solution must exist, but think that the emphasis on threshold agnostic metrics (map etc.) in academia means that it's not given much attention.

@SkalskiP
Copy link
Collaborator

SkalskiP commented Feb 1, 2024

Hi @GeorgePearse 👋🏻 I like the idea and I'd love to look at your initial implementation. If possible, I want the solution:

  • As much as possible, used the Metrics API already found in Supervision
  • If possible, it did not use external libraries. One of the main principles of Supervision is to limit external dependencies.

Such a solution requires a lot of steps, so I need to understand how we can combine it with what we have and how to design the next elements to be as reusable as possible. We will also need to come up with a better name for this task and a better name for the feature. Haha

@GeorgePearse
Copy link
Author

GeorgePearse commented Feb 1, 2024

Yeah all makes sense, tbh, the reason I want it to be integrated into supervision is to solve those very problems, at the minute I'm dealing with a lot of opaque code, I only trust the outputs from having visually inspected the predictions from lots of model/threshold combos that have used it.

As for API questions.

Just something like

# Ideally target_metric could also be a callback so that you a user could customise exactly what they want 
# to optimize for
per_class_thresholds: dict = optimize_thresholds(
    predictions_dataset,
    annotations_dataset, 
    target_metric='f1_score', 
    per_class=True,
    minimum_iou=0.75,
) 

@SkalskiP
Copy link
Collaborator

SkalskiP commented Feb 1, 2024

And what is stored inside per_class_thresholds? Dict[int, float] - class id to optimal IoU mapping?

What's inside optimize_thresholds? I'd appreciate any pseudocode.

@GeorgePearse
Copy link
Author

class id to optimal score, the minimum IoU to classify a prediction and annotation as a match is set upfront by the user. Is that not the far more common use case for shipping ML products? The minimum IoU is defined by business/product requirements / can be done easily enough visually on a handful of examples. Maybe I'm biased by having mostly trained models where localisation is of secondary importance to classification, and a much much easier problem.

@GeorgePearse
Copy link
Author

Complete pseudocode:

metrics = []
for class_name in class_list: 
    for threshold in range(0, 1, 100): 
          current_metric = calculate_metric(grid_of_matched_predictions_and_their_scores, metric='f1_score')
          metrics.append({
              'threshold': threshold,
              'class_name': class_name,
              'metric': metric,
          })

metrics_df = pd.DataFrame(metrics)

# but so that you get a row per class, whatever the .groupby() kind of query 
# would be to achieve that 
best_metrics = metrics_df[metrics_df['metric'] == max(metrics_df['metric'])

But everything probably needs to calculated in numpy to not make it painfully slow.

There's a decent chance that this is where most people get this data from currently https://github.com/rafaelpadilla/Object-Detection-Metrics, but the repo is as you'd expect of something 5/6 years old, and doesn't have the useability/documentation of a modern open-core project.

@GeorgePearse
Copy link
Author

GeorgePearse commented Feb 1, 2024

This is what using the fdet-api looks like for me at the minute

thresholds = []
thresholds_dict = {}
f1_score_dict = {}

for counter, class_name in enumerate(annotation_class_names):
      (
          class_name,
          fscore,
          conf,
          precision,
          recall,
          support,
      ) = cocoEval.getBestFBeta(
          beta=1, iouThr=0.5, classIdx=counter, average="macro"
      )
      class_threshold_dict = {
          "class_name": class_name,
          "fscore": fscore,
          "conf": conf,
          "precision": precision,
          "recall": recall,
          "support": support,
      }
      f1_score_dict[class_name] = fscore
      thresholds.append(class_threshold_dict)
      thresholds_dict[class_name] = conf

thresholds_df = pd.DataFrame(thresholds)
print(thresholds_df)

So I end up with both the threshold to achieve the metric I care about, and the metrics that that threshold achieves

@SkalskiP
Copy link
Collaborator

SkalskiP commented Feb 1, 2024

Understood. This sounds interesting to me. I'm worried about scope, especially if we want to reimplement all metrics.

  • We need to divide work into smaller chunks. I am plodding with reviews when I need to go through 2k lines of code. On top of this we could assign different tasks to different external contributors and speed up the work.
  • We need to develop MVP - the shortest path demonstrating the value of the potential solution. With only 1 metric for example.

@RigvedRocks
Copy link

@GeorgePearse Fine by me. You can create a new branch called and then I can refine your initial solution.

@GeorgePearse
Copy link
Author

GeorgePearse commented Feb 4, 2024

From a look through the metric functionality already implemented it looks like it wouldn't be too painful to add in. The object that comes out of this looks like it's already done most of the upfront maths needed.

Hard for me to tell just from a look, does the output structure contain the scores?

image

@SkalskiP
Copy link
Collaborator

SkalskiP commented Feb 5, 2024

@GeorgePearse, could you be a bit more specific? What do you mean by scores?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants