Skip to content

Library to calculate the optimal cut-off (and corresponding expected gain) of a classifier

Notifications You must be signed in to change notification settings

nmusmeci/classifier-cutoff-calculator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Classifier cut-off calculator

Python class that calculates the optimal cut-off for a classifier based on a cost/benefit analysis.

Use case

Whenever a classifier is used in a commercial setting there is a cost incurred when the classifier is wrong and a gain (or profit) when the classifier is right. For example, let us imagine a classifier trained to identify customers to include in a marketing email: such a model will generate a profit when it recommends a customer who ends up converting, but it will incur a cost when it recommends a customer who doesn't convert and unsubscribes from future marketing emails.

The functions in this repository make very easy to perform this cost-benefit calculation given a classifier.

Overview

The functions in this repo let the user calculate the expected net gain (i.e. total gain - total cost) for each value of the classifier's threshold. As inputs, the class needs a quantitative estimate of these costs and gains, as well as data about the classifier out-of-sample performance (y_true and y_score). The class then calculates the threshold that maximizes the net gain, providing a problem-specific and business-driven solution to the question of finding the optimal cut-off for a classifier (as opposed to generic approaches such as maximizing F1 score): see screenshot below from the tutorial notebook.

Note: numerical values of costs and benefits are inputs; they need to be derived from the specifics of the commercial application (e.g. in the example of the marketing email they could come from customer life-time value models).

This approach is inspired by the chapter 7 of the book Data Science for Business by Foster Provost and Tom Fawcett.

About

Library to calculate the optimal cut-off (and corresponding expected gain) of a classifier

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published