Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Sklearn pipeline test for more complicated Visualizers #1255

Open
9 tasks
lwgray opened this issue May 28, 2022 · 0 comments
Open
9 tasks

Add Sklearn pipeline test for more complicated Visualizers #1255

lwgray opened this issue May 28, 2022 · 0 comments
Labels
level: intermediate python coding expertise required

Comments

@lwgray
Copy link
Contributor

lwgray commented May 28, 2022

There are some visualizers that require additional work in order to write sklearn pipeline test. It is likely that the underlying visualizer needs to expose learned attributes needed to generate the visualizers. The following is an example using sklearn pipeline for the InterClusterDistanceMetric visualizer:

AttributeError: 'Pipeline' object has no attribute 'cluster_centers_'

See issues and PR
#1253
#1248
#1249

Issue:
#1257
PR:
#1259

Issue:
#1256
PR:
#1262

  • Decision Boundaries
  • RFECV
  • ValidationCurve
  • Add a pipeline model input test and quick method test for feature importances
  • Add a pipeline model input test and quick method test for alpha selection
  • Add a pipeline model input test and quick method test for InterClusterDistanceMetric
  • KElbowVisualizer
  • SilhouetteVisualizer
  • GridSearchColorPlot

Example

     def test_within_pipeline(self):
        """
        Test that visualizer can be accessed within a sklearn pipeline
        """
        X, y = load_mushroom(return_dataset=True).to_numpy()
        X = OneHotEncoder().fit_transform(X).toarray()

        cv = StratifiedKFold(n_splits=2, shuffle=True, random_state=11)
        model = Pipeline([
            ('minmax', MinMaxScaler()), 
            ('cvscores', CVScores(BernoulliNB(), cv=cv))
        ])

        model.fit(X, y)
        model['cvscores'].finalize()
        self.assert_images_similar(model['cvscores'], tol=2.0)

    def test_within_pipeline_quickmethod(self):
        """
        Test that visualizer quickmethod can be accessed within a
        sklearn pipeline
        """
        X, y = load_mushroom(return_dataset=True).to_numpy()
        X = OneHotEncoder().fit_transform(X).toarray()
        
        cv = StratifiedKFold(n_splits=2, shuffle=True, random_state=11)
        model = Pipeline([
            ('minmax', MinMaxScaler()), 
            ('cvscores', cv_scores(BernoulliNB(), X, y, cv=cv, show=False,
                                      random_state=42))
            ])
        self.assert_images_similar(model['cvscores'], tol=2.0)

    def test_pipeline_as_model_input(self):
        """
        Test that visualizer can handle sklearn pipeline as model input
        """
        X, y = load_mushroom(return_dataset=True).to_numpy()
        X = OneHotEncoder().fit_transform(X).toarray()

        cv = StratifiedKFold(n_splits=2, shuffle=True, random_state=11)
        model = Pipeline([
            ('minmax', MinMaxScaler()), 
            ('nb', BernoulliNB())
        ])

        oz = CVScores(model, cv=cv)
        oz.fit(X, y)
        oz.finalize()
        self.assert_images_similar(oz, tol=2.0)

    def test_pipeline_as_model_input_quickmethod(self):
        """
        Test that visualizer can handle sklearn pipeline as model input
        within a quickmethod
        """
        X, y = load_mushroom(return_dataset=True).to_numpy()
        X = OneHotEncoder().fit_transform(X).toarray()

        cv = StratifiedKFold(n_splits=2, shuffle=True, random_state=11)
        model = Pipeline([
            ('minmax', MinMaxScaler()), 
            ('nb', BernoulliNB())
        ])

        oz = cv_scores(model, X, y, show=False, cv=cv)
        self.assert_images_similar(oz, tol=2.0)

@DistrictDataLabs/team-oz-maintainers

@lwgray lwgray added the level: intermediate python coding expertise required label May 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level: intermediate python coding expertise required
Projects
None yet
Development

No branches or pull requests

1 participant