Allow ModelVisualizers to wrap Pipeline objects #498

bbengfort · 2018-07-13T13:06:32Z

Describe the solution you'd like

Our model visualizers expect to wrap classifiers, regressors, or clusters in order to visualize the model under the hood; they even do checks to ensure the right estimator is passed in. Unfortunately in many cases, passing a pipeline object as the model in question does not allow the visualizer to work, even though the model is acceptable as a pipeline, e.g. it is a classifier for classification score visualizers (more on this below). This is primarily because the Pipeline wrapper masks the attributes needed by the visualizer.

I propose that we modify the ModelVisualizer to change the ModelVisualizer.estimator attribute to a @property - when setting the estimator property, we can perform a check to ensure that the Pipeline has a final_estimator attribute (e.g. that it is not a transformer pipeline). When getting the estimator property, we can return the final estimator instead of the entire Pipeline. This should ensure that we can use pipelines in our model visualizers.

NOTE however that we will still have to fit(), predict(), and score() on the entire pipeline, so this is a bit more nuanced than it seems on first glance. There will probably have to be is_pipeline() checking and other estimator access utilities.

Is your feature request related to a problem? Please describe.

Consider the following, fairly common code:

from sklearn.pipeline import Pipeline
from sklearn.neural_network import MLPClassifier 
from sklearn.feature_extraction.text import TfidfVectorizer 

from yellowbrick.classifier import ClassificationReport 

model = Pipeline([
    ('tfidf', TfidfVectorizer()), 
    ('mlp', MLPClassifier()), 
]) 

oz = ClassificationReport(model)
oz.fit(X_train, y_train)
oz.score(X_test, y_test)
oz.poof()

This seems to be a valid model for a classification report, unfortunately the classification report is not able to access the MLPClassiifer's classes_ attribute since the Pipeline doesn't know how to pass that on to the final estimator.

I think the original idea for the ScoreVisualizers was that they would be inside of Pipelines, e.g.

model = Pipeline([
    ('tfidf', TfidfVectorizer()), 
    ('clf', ClassificationReport(MLPClassifier())), 
]) 

model.fit(X, y)
model.score(X_test, y_test)
model.named_steps['clf'].poof()

But this makes it difficult to use more than one visualizer; e.g. ROCAUC visualizer and CR visualizer.

Definition of Done

Update ModelVisualizer class with pipeline helpers
Ensure current tests pass
Add test to all model visualizer subclasses to pass in a pipeline as the estimator
Add documentation about using visualizers with pipelines

The text was updated successfully, but these errors were encountered:

Yogayu · 2019-04-07T15:59:32Z

Dear Mentor @bbengfort ,

I am very interested in this Project for GSoC2019. So I spend two days to understand the core concept of yellowbrick and the struct of the yellowbrick library. However, there is something confusing to me about this idea, so if you can discuss with me, it will greatly help.

Brief introduction

First of all, I would like to introduce my understanding of the project. Since I am just beginning to get familiar with this project, there may be some mistakes in my understanding. If there is something wrong with my understanding, I really hope you can correct it.

Yellowbrick's core concept is to better assist decision-making in various processes of machine learning through a visual approach. So the visual object is the core foundation.

The root of the visual object Visualizer inherits from Scikit-Learn's BaseEstimator class. It adds Visualizer interface to enable visualization.

ModelVisualizer inherits from Visualizer and Wrapper. It's kind of like the adaptor in the adaptor pattern(also known as a wrapper). ModelVisualizer wraps model. And there are more classes inherits from ModelVisualizer, such as ScoreVisualizer.

The relationship of them in UML is like this:

The Feather: Allow ModelVisualizers to wrap Pipeline objects

In the above, you describe that the ModelVisualizer can wrap the model. Also, in many cases, a pipeline object can be passed to the ModelVisualizer as a model. However, the problem is that Pipeline encapsulates the real model final_estimator, so the ModelVisualizer can't directly interact with final_estimator, but only via Pipeline.

The relationship of them is like this:

Current Status

I read the source code, and the code shows that the current implementation is: when passing a pipeline object to the ModelVisualizers, the pipeline is directly assigned to the ModelVisualizers's estimator. The related fit(), predict(), and score() methods are also directly called methods in the Pipeline class.

My Question

Q1: Do you mean we want that ModelVisualizers provide an interfaces (get final_estimator), which then the subclass can assess the final_estimator's classes_ attribute?

Q2: The source code of Pipeline shows we can get final_estimator through pipeline.steps[-1][1]. Is this the right way?
Q3: Under what circumstances do we need to use classes_ attribute? Could you give me an example?
Q4: The Pipeline itself already has fit(), predict(), and score() function. Therefore, when called in ModelVisualizer class and its subclasses, is it also directly using the Pipeline method. Or we have to implementation by ourselves?

I'm sorry that I wrote a little too much, but it is for the purpose of clearer expression and communication. If my question is too simple, please forgive me. You know, everyone has a beginning.

Thank you for your time and patient. Looking for your replay.

Thank you and best regards!
Xinyu You

Yogayu · 2019-04-08T12:38:23Z

Respected Mentors @rebeccabilbro @bbengfort @lwgray @ndanielsen @pdamodaran @wagner2010,
Sorry to bother again. Is there anyone who can help me with this problem? There is little time left for GSoC Proposal to be submitted.

wagner2010 · 2019-04-08T14:51:54Z

Hi @Yogayu wow this is great! Thanks so much for your work here in such a short period of time. I also did see your posts on the Google group which I will be responding to shortly. Because, as you mentioned, time is running short and because we’re not available to respond in detail please go ahead and incorporate this into your proposal and submit it. We believe you’re on the right track for a strong proposal and we’re not expecting perfection so much as seeing your code and your vision for this problem. Regardless of the outcome as it relates to GSoC we welcome your involvement with Yellowbrick! Thanks and stay well.

Yogayu · 2019-04-08T16:07:23Z

@wagner2010 I understand. Thank you.

bbengfort added type: feature a new visualizer or utility for yb priority: medium can wait until after next release level: intermediate python coding expertise required labels Jul 13, 2018

bbengfort added this to the v1.0 pre-work milestone Jan 2, 2019

wagner2010 added the GSoC-2019 label Jan 23, 2019

wagner2010 removed the GSoC-2019 label Feb 2, 2019

Yogayu mentioned this issue Apr 8, 2019

Write a blog post on a machine learning project highlighting Yellowbrick. (GSoC-2019) #691

Closed

bbengfort mentioned this issue May 3, 2019

Using multiple classifier visualizers #829

Closed

lwgray assigned bbengfort May 21, 2019

bbengfort linked a pull request Aug 27, 2019 that will close this issue

[WIP] Adds support for Pipelines with ModelVisualizers #955

Open

13 tasks

bbengfort modified the milestones: v1.0, v1.1 Aug 28, 2019

bbengfort modified the milestones: v1.1, v1.3 Jan 8, 2020

This was referenced May 13, 2022

Fixes #498 Modifies ModelVisualizers to wrap Pipeline objects #1245

Closed

Create test for utilizing Sklearn pipelines with Visualizers #1248

Closed

lwgray mentioned this issue May 28, 2022

Add test for Sklearn Pipeline - CVScores #1253

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow ModelVisualizers to wrap Pipeline objects #498

Allow ModelVisualizers to wrap Pipeline objects #498

bbengfort commented Jul 13, 2018

Yogayu commented Apr 7, 2019 •

edited

Yogayu commented Apr 8, 2019

wagner2010 commented Apr 8, 2019

Yogayu commented Apr 8, 2019

Allow ModelVisualizers to wrap Pipeline objects #498

Allow ModelVisualizers to wrap Pipeline objects #498

Comments

bbengfort commented Jul 13, 2018

Yogayu commented Apr 7, 2019 • edited

Brief introduction

The Feather: Allow ModelVisualizers to wrap Pipeline objects

Current Status

My Question

Yogayu commented Apr 8, 2019

wagner2010 commented Apr 8, 2019

Yogayu commented Apr 8, 2019

Yogayu commented Apr 7, 2019 •

edited