Distance Metric Recommendation for $k$-Means Clustering: A Meta-Learning Approach

This work was accepted for paper presentation at the 2022 IEEE Region 10 Conference (TENCON 2022), held virtually and in-person in Hong Kong:

The final version of our paper (as published in the conference proceedings of TENCON 2022) can be accessed via this link.
- Our preprint can be accessed via this link.
- Our TENCON 2022 presentation slides can be accessed via this link.
Our dataset of datasets is publicly released for future researchers.
Kindly refer to 0. Directory.ipynb for a guide on navigating through this repository.

If you find our work useful, please consider citing:

@INPROCEEDINGS{9978037,
  author={Gonzales, Mark Edward M. and Uy, Lorene C. and Sy, Jacob Adrianne L. and Cordel, Macario O.},
  booktitle={TENCON 2022 - 2022 IEEE Region 10 Conference (TENCON)}, 
  title={Distance Metric Recommendation for k-Means Clustering: A Meta-Learning Approach}, 
  year={2022},
  pages={1-6},
  doi={10.1109/TENCON55691.2022.9978037}}

This repository is also archived on Zenodo.

Description

ABSTRACT: The choice of distance metric impacts the clustering quality of centroid-based algorithms, such as $k$-means. Theoretical attempts to select the optimal metric entail deep domain knowledge, while experimental approaches are resource-intensive. This paper presents a meta-learning approach to automatically recommend a distance metric for $k$-means clustering that optimizes the Davies-Bouldin score. Three distance measures were considered: Chebyshev, Euclidean, and Manhattan. General, statistical, information-theoretic, structural, and complexity meta-features were extracted, and random forest was used to construct the meta-learning model; borderline SMOTE was applied to address class imbalance. The model registered an accuracy of 70.59%. Employing Shapley additive explanations, it was found that the mean of the sparsity of the attributes has the highest meta-feature importance. Feeding only the top 25 most important meta-features increased the accuracy to 71.57%. The main contribution of this paper is twofold: the construction of a meta-learning model for distance metric recommendation and a fine-grained analysis of the importance and effects of the meta-features on the model’s output.

INDEX TERMS: meta-learning, meta-features, $k$-means, clustering, distance metric, random forest

Authors

Mark Edward M. Gonzales
[email protected]
Lorene C. Uy
[email protected]
Jacob Adrianne L. Sy
[email protected]
Dr. Macario O. Cordel, II
[email protected]

This is the major course output in a machine learning class for master's students under Dr. Macario O. Cordel, II of the Department of Computer Technology, De La Salle University. The task is to create a ten-week investigatory project that applies machine learning to a particular research area or offers a substantial theoretical or algorithmic contribution to existing machine learning techniques.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
dataset_labels		dataset_labels
dataset_of_datasets		dataset_of_datasets
figures		figures
.gitignore		.gitignore
0. Directory.ipynb		0. Directory.ipynb
1. Dataset Labeling.ipynb		1. Dataset Labeling.ipynb
2. Meta-Feature Extraction.ipynb		2. Meta-Feature Extraction.ipynb
3. Model Building & Evaluation.ipynb		3. Model Building & Evaluation.ipynb
Distance Metric Recommendation for k-Means Clustering A Meta-Learning Approach.pdf		Distance Metric Recommendation for k-Means Clustering A Meta-Learning Approach.pdf
LICENSE.md		LICENSE.md
Presentation Slides.pdf		Presentation Slides.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distance Metric Recommendation for $k$-Means Clustering: A Meta-Learning Approach

Description

Authors

About

Releases 1

Packages

Languages

License

memgonzales/meta-learning-clustering

Folders and files

Latest commit

History

Repository files navigation

Distance Metric Recommendation for $k$-Means Clustering: A Meta-Learning Approach

Description

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages