Skip to content

preetham-salehundam/Text-Mining

Repository files navigation

Text-Mining

Text Mining using 20 mini news groups

This project focuses on feature extraction, feature selection, classification, and clustering in the context of text mining.

The Project consists of the following four parts:

  • Part 1: Extracting features

  • Part 2: Classification

  • Part 3: Feature selection

  • Part 4: Document clustering

Based on the collection of the given documents, we implement the generation of feature vectors based on the Term Frequency(TF), Inverse Document Frequency(IDF) and Term Frequency - Inverse Document Frequency(TF-IDF) for each document. Based on the output of this, we perform classification based on four different classifiers, implement the feature selection using two different feature selection methods and finally use the k-Means and hierarchical clustering algorithms to implement the document clustering.