Skip to content

jSimilarity is a library that implements various similarity measures

License

Notifications You must be signed in to change notification settings

vasgat/jSimilarity

Repository files navigation

jSimilarity

jSimilarity is a library that implements various similarity measures.

String Character-based Similarities:
Jaro
Jaro-Winker

String Token-based Similarities:
Jaccard
Cosine similarity

Document-based Similarities:
TF-IDF
SoftTFIDF

Useful implemented Utilities
TextDocument
Corpus
BasicTokenizer

JSimilarity mainly focuses on the implementation of tf-idf and also a number of variations are considered (Smooth IDF, Max IDF, Normalized TF, Double Normalization 0.5 etc.)