Skip to content

Using supervised and unsupervised machine learning methods to process and classify raw text data.

Notifications You must be signed in to change notification settings

coreycoole/clustering_text_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

clustering_text_classification

Using supervised and unsupervised machine learning methods to process and classify raw text data.

  • Objective: Pick a set of texts, process the texts and apply a series of unsupervised clustering methods to group the texts. Now analyze which clustering method groups the texts most consistely with respect to the person of interest. Apply supervised and unsupervised permutations of feature selection and generation to build a model that will classify the texts by person of interest. Lastly, evaluate this model against a holdout group of 25%, analyze the consistency of its prediction and explain any notable divergencies.

  • Data Source: Wikiquote.org

  • Pull 700 quotes from 10 wikiquote archives (70 per).

  • Persons of interest: Plato, Socrates, Sigmund Frued, Friedrich Nietzsche, René Descartes, Immanuel Kant, David Hume, Bertrand Russell, John Locke, Noam Chomsky

About

Using supervised and unsupervised machine learning methods to process and classify raw text data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published