Skip to content

havelhakimi/seeds

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Prototype Based Clustering Analysis on seeds dataset

This a solution notebook to an assignment question given in a Data Mining graduate course. Each code block is accompanied by relevant analysis wherever required.
Dataset link: https://archive.ics.uci.edu/ml/datasets/seeds
Broadly, the following steps have been performed in this solution notebook:

  • Minimal preprocessing on the dataset
  • Explained limitations of KMeans
  • Suggested two existing algorithms (KMedoids and CLARANS) that use some technique to mitigate limitations of KMeans
  • Visualization of given class labels using TSNE
  • Ran KMedoids and CLARANS on the seeds dataset and reported the best results obtained on various cluster validity indices.
    • Further compared the results with KMeans.
  • Reported and visualized the hyperparameter tuning for KMedoids and CLARANS required to achieve the best results obtained on the seeds dataset
  • These above assumptions and the flow of work is according to the questions asked in assignment.