TPOT: Pipelines Optimization with Genetic Algorithms

This repository contains an implementation of TPOT for obtaining optimal pipelines with the use of genetic algorithms.

If you want to know more about TPOT, how it works and what its components are, I really recommend you take a look at the blog: TPOT: Pipelines Optimization with Genetic Algorithms

1. Files

main.py: Contains the implementation of TPOT Classifier
optimal_pipeline.py: Contains the optimal suggested pipeline obtained once TPOT Classifier has been implemented.

2. How to use

I recommend you to work with a virtual environment, in this case I am using pipenv. So in order to install the dependencies located in the Pipfile you just need to type:

pipenv install

and then

pipenv shell

For optimizing the pipeline with TPOT Classifier, first comment the following line in main.py :

if __name__ == "__main__":
    automl = AutoML()
    automl.load_data()
    automl.pipeline_optimization()
    # automl.train_suggested_tpot()

then run:

python -Bi main.py

once the optimization has been finalized, in the python console type the following:

automl.model.export('optimal_pipeline.py')

the previous command will overwrite the file optimal_pipeline.py. Open the optimal_pipeline.py and copy the pipeline function, the one looks like this:

# Average CV score on the training set was: 0.9347254053136407
exported_pipeline = make_pipeline(
    PolynomialFeatures(degree=2, include_bias=False, interaction_only=False),
    VarianceThreshold(threshold=0.2),
    ZeroCount(),
    GradientBoostingClassifier(learning_rate=1.0, max_depth=10, max_features=0.9000000000000001, min_samples_leaf=16, min_samples_split=3, n_estimators=100, subsample=0.7000000000000001)
)

paste the previous function into the main.py file in the following function, such as:

def pipeline_suggested_by_tpot(self):
    # Copied from optimal pipeline suggested by tpot in file "optimal_pipeline.py"
    # Initialize 
    exported_pipeline = make_pipeline(
                    PolynomialFeatures(degree=2, include_bias=False, interaction_only=False),
                    VarianceThreshold(threshold=0.2),
                    ZeroCount(),
                    GradientBoostingClassifier(learning_rate=1.0, max_depth=10, max_features=0.9000000000000001, min_samples_leaf=16, min_samples_split=3, n_estimators=100, subsample=0.7000000000000001)
                    )
    # Init training
    exported_pipeline.fit(self.x_train, self.y_train)
    
    print(f"Train acc: {exported_pipeline.score(self.x_train, self.y_train)}")
    print(f"Test acc: {exported_pipeline.score(self.x_test, self.y_test)}")

Great, the last step is just run the main.py by commenting the following lines:

if __name__ == "__main__":
    automl = AutoML()
    automl.load_data()
    # automl.pipeline_optimization()
    automl.train_suggested_tpot()

that is it!

3. Contributing

Feel free to fork the model and add your own suggestiongs.

Fork the Project
Create your Feature Branch (git checkout -b feature/YourGreatFeature)
Commit your Changes (git commit -m 'Add some YourGreatFeature')
Push to the Branch (git push origin feature/YourGreatFeature)
Open a Pull Request

5. Contact

If you have any question, feel free to reach me out at:

6. License

Distributed under the MIT License. See LICENSE.md for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
img		img
Pipfile		Pipfile
README.md		README.md
main.py		main.py
optimal_pipeline.py		optimal_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

img

img

Pipfile

Pipfile

README.md

README.md

main.py

main.py

optimal_pipeline.py

optimal_pipeline.py

Repository files navigation

TPOT: Pipelines Optimization with Genetic Algorithms

Table of Contents

1. Files

2. How to use

3. Contributing

5. Contact

6. License

About

Releases

Packages

Languages

FernandoLpz/TPOT-Optimal-Pipeline-Searching

Folders and files

Latest commit

History

Repository files navigation

TPOT: Pipelines Optimization with Genetic Algorithms

Table of Contents

1. Files

2. How to use

3. Contributing

5. Contact

6. License

About

Topics

Resources

Stars

Watchers

Forks

Languages