Awesome Ensemble Learning

Ensemble Learning (also known as Ensembling) is an exciting yet challenging field. Ensembling leverages multiple base models to achieve better predictive performance, which is often better than any of the constituent models alone¹. It has been proven critical in many practical applications and data science competitions², e.g., Kaggle.

To promote the learning of ensembling, we create this repository with:

Books & Academic Papers
Online Courses and Videos
Open-source and Commercial Libraries/Toolboxes and Datasets
Key Conferences & Journals

More items will be added to the repository. Please feel free to suggest other key resources by opening an issue report, submitting a pull request, or dropping me an email @ ([email protected]). Enjoy reading!

1. Books & Tutorials

1.1. Books

Ensemble Methods: Foundations and Algorithms by Zhi-Hua Zhou³: Classical text book covering most of the ensemble learning techniques. A must-read for people in the field. [Full Book]

Ensemble Machine Learning: Methods and Applications edited by Oleg Okun⁴: Responding to a shortage of literature dedicated to the topic, this volume offers comprehensive coverage of state-of-the-art ensemble learning techniques, including various contributions from researchers in leading industrial research labs.

Applications of Supervised and Unsupervised Ensemble Methods edited by Oleg Okun⁵: This book contains the extended papers presented at the 2nd Workshop on Supervised and Unsupervised Ensemble Methods and their Applications (SUEMA), in conjunction with ECAI’2008.

Data Mining and Knowledge Discovery Handbook Chapter 45 (Ensemble Methods for Classifiers): by Lior Rokach⁶: This chapter provides an overview of ensemble methods in classification tasks. We present all important types of ensemble method including boosting and bagging. Combining methods and modeling issues such as ensemble diversity and ensemble size are discussed.

Outlier Ensembles: An Introduction by Charu Aggarwal and Saket Sathe⁷: Great intro book for ensemble learning in outlier analysis.

1.2. Tutorials

Tutorial Title	Venue	Year	Ref	Materials
On the Power of Ensemble: Supervised and Unsupervised Methods Reconciled	SDM	2010	⁸	[HTML]

2. Courses/Seminars/Videos

Coursera - How to Win a Data Science Competition: Learn from Top Kagglers:

Ensembling (92 mins)

Coursera - Machine Learning: Classification by University of Washington partly covers the topic:

Machine Learning and Data Mining by Prof. Alexander Ihler: Section on ensembling (4 videos).

3. Toolboxes & Datasets ---------------------

3.1. Toolboxes

[Python] combo: combo is a comprehensive Python toolbox for combining machine learning (ML) models and scores for various tasks, including classification, clustering, and anomaly detection. It supports the combination of ML models from core libraries such as scikit-learn and xgboost (documentation).

[Python] pycobra: python library implementing ensemble methods for regression, classification and visualisation tools including Voronoi tesselations.

[Python] DESlib: A Python library for dynamic classifier and ensemble selection.

[Python] imbalanced-learn: A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning (documentation).

3.2. Datasets

As a subfield of machine learning, ensemble learning is usually tested against general machine learning benchmark datasets. Some helpful links can be found below:

4. Papers

4.1. Overview & Survey Papers

Paper Title	Venue	Year	Ref	Materials
Ensemble methods in machine learning	MCS	2000	¹⁰	[PDF]
Popular ensemble methods: An empirical study	JAIR	1999	¹¹	[PDF]
Ensemble learning: A survey	Wiley Interdisciplinary Reviews	2018	¹²	[PDF]

4.2. Key Algorithms

Abbreviation	Paper Title	Venue	Year	Ref	Materials
Bagging	Bagging predictors	Machine Learning	1996	¹³	[PDF]
Boosting	A decision-theoretic generalization of on-line learning and an application to boosting	JCSS	1997	¹⁴	[PDF]
N/A	Bagging, Boosting, and C4.5	AAAI/IAAI	1996	¹⁵	[PDF]
Stacking	Stacked generalization	Neural Networks	1992	¹⁶	[PDF]
Stacking	Stacked regressions	Machine Learning	1996	¹⁷	[PDF]

4.3. Boosting

Paper Title	Venue	Year	Ref	Materials
Xgboost: A scalable tree boosting system	KDD	2016	¹⁸	[PDF]
Lightgbm: A highly efficient gradient boosting decision tree	NIPS	2017	¹⁹	[PDF]
CatBoost: unbiased boosting with categorical features	NIPS	2018	²⁰	[PDF]

4.4. Clustering Ensemble

Paper Title	Venue	Year	Ref	Materials
Cluster Ensembles – A Knowledge Reuse Framework for Combining Multiple Partitions	JMLR	2002	²¹	[PDF]
Clusterer Ensemble	KBS	2006	²²	[PDF]
A survey of clustering ensemble algorithms	IJPRAI	2011	²³	[PDF]
Clustering ensemble method	Cybernetics	2019	²⁴	[PDF]

4.5. Outlier Ensemble

Paper Title	Venue	Year	Ref	Materials
Outlier ensembles: position paper	SIGKDD Explorations	2013	²⁵	[PDF]
Ensembles for unsupervised outlier detection: challenges and research questions a position paper	SIGKDD Explorations	2014	²⁶	[PDF]
Isolation forest	ICDM	2008	²⁷	[PDF]
Outlier detection with autoencoder ensembles	SDM	2017	²⁸	[PDF]
An Unsupervised Boosting Strategy for Outlier Detection Ensembles	PAKDD	2018	²⁹	[HTML]
LSCP: Locally selective combination in parallel outlier ensembles	SDM	2019	³⁰	[PDF]

4.6. Ensemble Learning for Data Stream

Paper Title	Venue	Year	Ref	Materials
A survey on ensemble learning for data stream classification	ACM Computing Surveys	2017	³¹	[PDF]
Ensemble learning for data stream analysis: A survey	Information Fusion	2017	³²	[PDF]

5. Key Conferences/Workshops/Journals

5.1. Conferences & Workshops

Key data mining conference deadlines, historical acceptance rates, and more can be found data-mining-conferences.

ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD)

ACM International Conference on Management of Data (SIGMOD)

The Web Conference (WWW)

IEEE International Conference on Data Mining (ICDM)

SIAM International Conference on Data Mining (SDM)

IEEE International Conference on Data Engineering (ICDE)

ACM InternationalConference on Information and Knowledge Management (CIKM)

ACM International Conference on Web Search and Data Mining (WSDM)

The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)

The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)

5.2. Journals

ACM Transactions on Knowledge Discovery from Data (TKDD)

IEEE Transactions on Knowledge and Data Engineering (TKDE)

ACM SIGKDD Explorations Newsletter

Data Mining and Knowledge Discovery

Knowledge and Information Systems (KAIS)

References

Opitz, D. and Maclin, R., 1999. Popular ensemble methods: An empirical study. Journal of artificial intelligence research, 11, pp.169-198.↩
Bell, R.M. and Koren, Y., 2007. Lessons from the Netflix prize challenge. SIGKDD Explorations, 9(2), pp.75-79.↩
Zhou, Z.H., 2012. Ensemble methods: foundations and algorithms. Chapman and Hall/CRC.↩
Zhang, C. and Ma, Y. eds., 2012. Ensemble machine learning: methods and applications. Springer Science & Business Media.↩
Okun, O. ed., 2009. Applications of supervised and unsupervised ensemble methods (Vol. 245). Springer.↩
Rokach L. (2005) Ensemble Methods for Classifiers. In: Maimon O., Rokach L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA↩
Aggarwal, C.C. and Sathe, S., 2017. Outlier ensembles: An introduction. Springer.↩
Gao, J., Fan, W. and Han, J., 2010. On the power of ensemble: Supervised and unsupervised methods reconciled. In Tutorial on SIAM Data Mining Conference (SDM), Columbus, OH.↩
Olson, R.S., La Cava, W., Orzechowski, P., Urbanowicz, R.J. and Moore, J.H., 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData mining, 10(1), p.36.↩
Dietterich, T.G., 2000, June. Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp. 1-15). Springer, Berlin, Heidelberg.↩
Opitz, D. and Maclin, R., 1999. Popular ensemble methods: An empirical study. Journal of artificial intelligence research, 11, pp.169-198.↩
Sagi, O. and Rokach, L., 2018. Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), p.e1249.↩
Breiman, L., 1996. Bagging predictors. Machine learning, 24(2), pp.123-140.↩
Freund, Y. and Schapire, R.E., 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), pp.119-139.↩
Quinlan, J.R., 1996, August. Bagging, boosting, and C4.5. In AAAI/IAAI, Vol. 1 (pp. 725-730).↩
Wolpert, D.H., 1992. Stacked generalization. Neural networks, 5(2), pp.241-259.↩
Breiman, L., 1996. Stacked regressions. Machine learning, 24(1), pp.49-64.↩
Chen, T. and Guestrin, C., 2016, August. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794). ACM.↩
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. and Liu, T.Y., 2017. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems (pp. 3146-3154).↩
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V. and Gulin, A., 2018. CatBoost: unbiased boosting with categorical features. In Advances in Neural Information Processing Systems (pp. 6638-6648).↩
Strehl, A. and Ghosh, J., 2002. Cluster ensembles---a knowledge reuse framework for combining multiple partitions. Journal of machine learning research, 3(Dec), pp.583-617.↩
Zhou, Z.H. and Tang, W., 2006. Clusterer ensemble. Knowledge-Based Systems, 19(1), pp.77-83.↩
Vega-Pons, S. and Ruiz-Shulcloper, J., 2011. A survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence, 25(03), pp.337-372.↩
Alqurashi, T. and Wang, W., 2019. Clustering ensemble method. International Journal of Machine Learning and Cybernetics, 10(6), pp.1227-1246.↩
Aggarwal, C.C., 2013. Outlier ensembles: position paper. ACM SIGKDD Explorations Newsletter, 14(2), pp.49-58.↩
Zimek, A., Campello, R.J. and Sander, J., 2014. Ensembles for unsupervised outlier detection: challenges and research questions a position paper. ACM Sigkdd Explorations Newsletter, 15(1), pp.11-22.↩
Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In International Conference on Data Mining, pp. 413-422. IEEE.↩
Chen, J., Sathe, S., Aggarwal, C. and Turaga, D., 2017, June. Outlier detection with autoencoder ensembles. SIAM International Conference on Data Mining, pp. 90-98. Society for Industrial and Applied Mathematics.↩
Campos, G.O., Zimek, A. and Meira, W., 2018, June. An Unsupervised Boosting Strategy for Outlier Detection Ensembles. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 564-576). Springer, Cham.↩
Zhao, Y., Nasrullah, Z., Hryniewicki, M.K. and Li, Z., 2019, May. LSCP: Locally selective combination in parallel outlier ensembles. In Proceedings of the 2019 SIAM International Conference on Data Mining (SDM), pp. 585-593. Society for Industrial and Applied Mathematics.↩
Gomes, H.M., Barddal, J.P., Enembreck, F. and Bifet, A., 2017. A survey on ensemble learning for data stream classification. ACM Computing Surveys (CSUR), 50(2), p.23.↩
Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J. and Woźniak, M., 2017. Ensemble learning for data stream analysis: A survey. Information Fusion, 37, pp.132-156.↩

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LICENSE		LICENSE
README.rst		README.rst
url_check.py		url_check.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.rst

README.rst

url_check.py

url_check.py

Repository files navigation

Awesome Ensemble Learning

Table of Contents

1. Books & Tutorials

1.1. Books

1.2. Tutorials

2. Courses/Seminars/Videos

3.1. Toolboxes

3.2. Datasets

4. Papers

4.1. Overview & Survey Papers

4.2. Key Algorithms

4.3. Boosting

4.4. Clustering Ensemble

4.5. Outlier Ensemble

4.6. Ensemble Learning for Data Stream

5. Key Conferences/Workshops/Journals

5.1. Conferences & Workshops

5.2. Journals

References

About

Releases

Sponsor this project

Packages

Languages

License

yzhao062/awesome-ensemble-learning

Folders and files

Latest commit

History

Repository files navigation

Awesome Ensemble Learning

Table of Contents

1. Books & Tutorials

1.1. Books

1.2. Tutorials

2. Courses/Seminars/Videos

3.1. Toolboxes

3.2. Datasets

4. Papers

4.1. Overview & Survey Papers

4.2. Key Algorithms

4.3. Boosting

4.4. Clustering Ensemble

4.5. Outlier Ensemble

4.6. Ensemble Learning for Data Stream

5. Key Conferences/Workshops/Journals

5.1. Conferences & Workshops

5.2. Journals

References

About

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Languages