Topic Modelling for Humans
-
Updated
Jun 4, 2024 - Python
Topic Modelling for Humans
Hybrid approach combining dictionary-based NER and doc2vec
An approach exploring and assessing literature-based doc-2-doc recommendations using word2vec combined with doc2vec, and applying it to TREC and RELISH datasets
An approach exploring and assessing literature-based doc-2-doc recommendations using a doc2vec and applying to TREC and RELISH datasets
Q3 of Final Project Assignment of the course 'Foundations of Data Science' @ CBS
Topic Modeling in Cython
A simple Django-based resume ranker website where recruiters post their jobs and candidates applies for their desired vacancies. The system gets the document similarity between the job description and the candidate resumes, generates similarity scores using the KNN model, and rank or shortlist the candidate resumes.
A PoC on document comparison using various methods in NLP
Information Retrieval Lab
Assessing MinHash LSH for text similarity. Compares with kNN using BART embeddings as ground truth. Involves data preprocessing, shingle creation, LSH experiments. Findings inform LSH's efficiency in document similarity tasks, enhancing understanding of LSH techniques.
Compare sentences from input document with all sentences from reference documents - find very similar ones.
Approximate document similarity with Minhash + Locality Sensitive Hashing
NLP of Warren Buffett's annual letter to shareholders
Repo for Aspire - A scientific document similarity model based on matching fine-grained aspects of scientific papers.
This movie recommendation system is designed to provide users with movie recommendations based on the similarity between movies. The system utilizes cosine similarity to identify movies that are closely related in terms of their features, allowing users to discover similar movies based on their preferences.
Given a set of documents and the minimum required similarity threshold find the number of document pairs that exceed the threshold
Document Search Engine project with TF-IDF abd Google universal sentence encoder model
DocxMatch is a Streamlit app that analyzes the similarity between Word files.
Add a description, image, and links to the document-similarity topic page so that developers can more easily learn about it.
To associate your repository with the document-similarity topic, visit your repo's landing page and select "manage topics."