-
-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "scoring" argument to score
#28995
Comments
Well there is already a separate module for importing metrics.. Can't we just make a bridge between the scoring parameter u want and the metrics sklearn module, such that it refers to the metric from sklearn module. Besides this, there are lot of non accuracy metrics....which one should we focus upon |
I agree that current But IMO, the question is a bit broader: does scikit-learn want to have more syntactic sugar or not? If yes, then adding the possibility of choosing the metric to the |
The scorers are the bridge to scikit-learn/sklearn/metrics/_scorer.py Lines 810 to 814 in 5e5cc34
I'm okay with adding a # User needs to know about `get_scorer`
from sklearn.metrics import get_scorer
roc_auc_scorer = get_scorer("roc_auc")
roc_auc_scorer(est, X_test, y_test) Or using the metrics directly: from sklearn.metrics import roc_auc_score
# User needs to know that `roc_auc_score` requires `y_score`:
y_decision = est.decision_function(X_test)
roc_auc_score(y_test, y_decision)
# Of if estimator only has `predict_proba`:
roc_auc_score(y, clf.predict_proba(X)[:, 1]) With this proposal, we can write |
I would be in favor of deprecating default scorers. A few related points:
This probably does need a SLEP. |
I think it needs a SLEP because it is going to change deeply the API. Regarding the default scorers, considering the difficulties to select the right metric for each use case, then not having a default might be better.
To me this is an extension of the API. We don't need to settle on this at first. Having the
I would say yes but it means that we need to properly define what is the scorer API. For instance, there already this difference between metric scorer and curve scorer (that does not exist yet but it probably should). |
Why call the argument
I am -1 on I think providing a default metric as part of an estimator is useful for users. A bit like setting something like the number of trees in a random forest. While ten (I think the current default) is rarely the right value, I think it is useful that step 1 of using Same with the metric you use to evaluate your setup. You almost always want to think about it and make a deliberate choice, but the fact that there is a default to get you started is a good thing imho. |
I agree on not adding
Can you elaborate on that? I think I'm not familiar with the proposal. @betatim I suggested
It's 100, we fixed that quite a while ago :) I think the default parameters and the default metric have indeed similar properties, though I'd say the consequences of the choice of metric are even worse. There's some hyper-parameters that have a huge impact, like kernel in SVC, but I think not tuning hypers is rarely as bad as using accuracy in an imbalanced classification problem (which is all classification problems). I proposed two changes, adding they keyword arg, and removing the default. Just adding the keyword arg would at least smooth over the path from using accuracy to using something more relevant. |
What would be the default value of scorer?.. scorer = None ??? If you compare it to GridSearchCV etc. Moreover, scorer value would vary a lot in order to run multiple benchmark models. My question is would you conclude to put None as default value? |
I think we're hurting people by not forcing people to think about their performance metrics. Those metrics have real life consequences and our defaults make no sense, and have nothing to do with the real life usecase at hand for the user. We should have a much better documentation page on how to choose the metric for that purpose, and then we should force people to choose themselves. |
So....
Or
I'm happy to write the SLEP after the NeurIPS deadline (and my vacation). |
I'd go more with the first version; it feels smoother. Happy to review the SLEP. |
Will you open a PR...coz I'm very eager to contribute..... |
@PragyanTiwari this first needs extensive discussion via a SLEP, which @amueller will work on, and then we'll get to the implementation. This is also not a good first contribution candidate. |
@PragyanTiwari This is probably not the best issue to start with because we are going to need a scikit-learn enhancement proporal (SLEP) that will require much more thinking before to actually doing a PR that is going to be merged. |
When speaking about the curve scorer, I'm referring to |
Ok....thanks for the info |
Ah ok, well, then lets stick to it :D
Agreed. For me the question is if forcing people to think about it is the solution. My guess as to why people keep doing "crazy" things like accuracy for imbalanced problems is that somehow they have no idea and don't realise it and have no idea where to even start thinking about this. Which isn't a novel insight, so probably has been discussed to death. The philosophical question (for me) is if we actually do these kinds of people a service by forcing them to make a decision about something that they don't even know where to start reading/informing themselves about. I am sure people use tools that are easy to use and get started with. No matter how methodologically flawed they are. Which makes me think we need to maintain an easy on ramp and then find a way of educating people after the fact, somehow (the hard part). Otherwise they will leave and use
The keyword arg change is an easy yes for me. Removing the default is a more difficult/philosophical question for me on which I flipflop regarding "the thing we should do" :-/ |
The question on my mind is this: will Because, IMO, this is crucial functionality, especially if we are talking about choosing the right metric for the task. A lot of real world problems require specific mathematical or business metrics, that users will need to code themselves. |
@glevv scoring will accept "scorer" as provided by |
That's why to me it's crucial to have a page where people can easily understand what they should use. We should write that first. And then when they don't provide a scorer, we raise with an error which has a link to the page where they can read about it. |
Related discussion in #29065. I'm using the from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV, train_test_split
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
estimator = LogisticRegression(n_jobs=-1)
param_grid = {"C": [10, 1.0, 0.1]}
score = ["accuracy", "balanced_accuracy", "precision_macro", "recall_macro", "f1_macro"]
grid_search = GridSearchCV(estimator, param_grid, scoring=score, refit="accuracy")
grid_search.fit(X_train, y_train)
grid_search.score(X_test, y_test) At the moment, metrics = {}
for name, scorer in grid_search.scorer_.items():
score = scorer(grid_search, X_test, y_test)
metrics[name] = score IMO, sklearn is all about syntactic sugar. The fact that I can perform model training, cross validation, hyperparameter tuning, and refitting on a single line is awesome. The fact that evaluating on test takes 4 lines is a bit of a surprise. |
In the case of #28995 (comment), if we introduce I'm liking the proposal more because it can also support multiple metrics, i.e. est.score(
X, y,
scoring=["accuracy", "balanced_accuracy", "precision_macro", "recall_macro", "f1_macro"],
) |
Can I open a pr for this?? |
@pranav-bot no this is still under discussion, and a hard one to tackle. Please try some of more easier issues, or easy stalled PRs first. |
Describe the workflow you want to enable
I want to enable non-accuracy metrics to
estimator.score
, and ultimately deprecate the default values ofaccuracy
andr2
. I would call itscoring
though it's a bit redundant but consistent.That would allow us to get rid of the default scoring methods, which are objectively bad and misleading, and it would require the minimum amount of code changes for anyone.
Describe your proposed solution
Replace
with
or rather
(or
r2
for regression).Describe alternatives you've considered, if relevant
accuracy
andr2
are bad)scoring
method.I can't think of any other alternatives tbh.
Additional context
I think in theory this requires a slep, as it's changing shared API, right?
The text was updated successfully, but these errors were encountered: