Skip to content

FernandoLpz/TPOT-Optimal-Pipeline-Searching

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Medium Twitter Linkedin

TPOT: Pipelines Optimization with Genetic Algorithms

This repository contains an implementation of TPOT for obtaining optimal pipelines with the use of genetic algorithms.

If you want to know more about TPOT, how it works and what its components are, I really recommend you take a look at the blog: TPOT: Pipelines Optimization with Genetic Algorithms

Table of Contents

1. Files

  • main.py: Contains the implementation of TPOT Classifier
  • optimal_pipeline.py: Contains the optimal suggested pipeline obtained once TPOT Classifier has been implemented.

2. How to use

I recommend you to work with a virtual environment, in this case I am using pipenv. So in order to install the dependencies located in the Pipfile you just need to type:

pipenv install

and then

pipenv shell

For optimizing the pipeline with TPOT Classifier, first comment the following line in main.py :

if __name__ == "__main__":
    automl = AutoML()
    automl.load_data()
    automl.pipeline_optimization()
    # automl.train_suggested_tpot()

then run:

python -Bi main.py

once the optimization has been finalized, in the python console type the following:

automl.model.export('optimal_pipeline.py')

the previous command will overwrite the file optimal_pipeline.py. Open the optimal_pipeline.py and copy the pipeline function, the one looks like this:

# Average CV score on the training set was: 0.9347254053136407
exported_pipeline = make_pipeline(
    PolynomialFeatures(degree=2, include_bias=False, interaction_only=False),
    VarianceThreshold(threshold=0.2),
    ZeroCount(),
    GradientBoostingClassifier(learning_rate=1.0, max_depth=10, max_features=0.9000000000000001, min_samples_leaf=16, min_samples_split=3, n_estimators=100, subsample=0.7000000000000001)
)

paste the previous function into the main.py file in the following function, such as:

def pipeline_suggested_by_tpot(self):
    # Copied from optimal pipeline suggested by tpot in file "optimal_pipeline.py"
    # Initialize 
    exported_pipeline = make_pipeline(
                    PolynomialFeatures(degree=2, include_bias=False, interaction_only=False),
                    VarianceThreshold(threshold=0.2),
                    ZeroCount(),
                    GradientBoostingClassifier(learning_rate=1.0, max_depth=10, max_features=0.9000000000000001, min_samples_leaf=16, min_samples_split=3, n_estimators=100, subsample=0.7000000000000001)
                    )
    # Init training
    exported_pipeline.fit(self.x_train, self.y_train)
    
    print(f"Train acc: {exported_pipeline.score(self.x_train, self.y_train)}")
    print(f"Test acc: {exported_pipeline.score(self.x_test, self.y_test)}")

Great, the last step is just run the main.py by commenting the following lines:

if __name__ == "__main__":
    automl = AutoML()
    automl.load_data()
    # automl.pipeline_optimization()
    automl.train_suggested_tpot()

that is it!

3. Contributing

Feel free to fork the model and add your own suggestiongs.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/YourGreatFeature)
  3. Commit your Changes (git commit -m 'Add some YourGreatFeature')
  4. Push to the Branch (git push origin feature/YourGreatFeature)
  5. Open a Pull Request

5. Contact

If you have any question, feel free to reach me out at:

6. License

Distributed under the MIT License. See LICENSE.md for more information.