Big data homework solutions
-
Updated
Jun 21, 2017 - Python
Big data homework solutions
Aurora karton for calculating minhash from input dataset.
University work. Approximate aligner for long DNA sequences. Estimates Jaccard similarity from k-mers via minimizers and MinHash, then uses it as a sequence identity proxy.
MPEI Project - Search repeated news (using Bloom filter) and similar news (using MinHash) from a news API.
Probability Methods for Informatics Engineering | UA 2018/2019
similarity of the texts (Jaccard Similarity, Minhash, LSH)
Assignment-2 for CS F469 Information Retrieval Course
This repository contains code and analysis for a homework assignment on recommendation systems and clustering algorithms in Python. Implements techniques like minhash, LSH, feature engineering, dimensionality reduction, K-means and DBSCAN clustering.
Finding Similar Items: Textually Similar Documents
SpellChecker: an application to check for spell errors.
(WIP) HTTP server that deploy distributes Vokter (https://github.com/vokter/vokter) through a REST API.
Python package for fast MinHash calculation and operations
An implementation of the MinHashing algorithm in C using POSIX threads.
Implementation of the paper "Finding Highly Correlated Pairs with Powerful Pruning" in Java.
Trabalho Prático da UC de Métodos Probabilísticos para Engenharia Informática, UA 2019/2020
Sample Jetty/Jersey2 server that interoperates with a running Vokter server (https://github.com/vokter/vokter).
Minhash text analyzer developed during Algorithmics subject.
Add a description, image, and links to the minhash topic page so that developers can more easily learn about it.
To associate your repository with the minhash topic, visit your repo's landing page and select "manage topics."