"Word2Vec for Russian text" - my project for course "Scientific Data Computing" in University of Tartu. It was presented as 20-minutes talk on 6th Estonian Digital Humanities Conference at September 2018.
-
Updated
Sep 27, 2018 - Jupyter Notebook
"Word2Vec for Russian text" - my project for course "Scientific Data Computing" in University of Tartu. It was presented as 20-minutes talk on 6th Estonian Digital Humanities Conference at September 2018.
NLP Dataset Creation and Semantic Search Demonstration
Retrieval-Augmented Generation using Azure OpenAI
Universal-Sentence-Encoder-Multilingual-QA is a model developed by researchers at Google mainly for the purpose of question answering. You can use this template to import the model in Inferless.
"BrightPsych" is a holistic mental health platform featuring a supportive chatbot and detail CBT analysis for disorders. Daily Mood Tracking aids emotional well-being, while data analysis unveils student mental health trends. Guided mindfulness contribute to resilience in a nurturing space. Empower, Engage and Elevate through Community Forum.
BGE-M3 is an innovative project known for its versatility, featuring Multi-Functionality, Multi-Linguality, and Multi-Granularity.
MedCPT generates embeddings of biomedical texts that can be used for semantic search (dense retrieval). MedCPT Query Encoder: compute the embeddings of short texts (e.g., questions, search queries, sentences). In this template, we will import the MedCPT Query Encoder on the Inferless Platform.
State-of-the-Art Ember embedding model for retrieval augmented generation
This is a sentence embedding model, initialized from xlm-roberta-large and continually trained on a mixture of multilingual datasets. It supports 100 languages from xlm-roberta, but low-resource languages may see performance degradation.
MS-marco-MiniLM-L-12-v2 model can be used for Information Retrieval: Given a query, encode the query will all possible passages (e.g. retrieved with ElasticSearch). Then sort the passages in a decreasing order.
Julia experimentation using sequence-based NLP models
OpenAI Text Embedding. Clean, process and create vectorize representation of text for indexing and semantic search
jina-embeddings-v2-base-en is an English, monolingual embedding model supporting 8192 sequence length. It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of ALiBi to allow longer sequence length. The backbone jina-bert-v2-base-en is pretrained on the C4 dataset.
Functionality of the Bot that fills a particular form on request from the user
Kickstarter project success or failure prediction. Using Word2Vec to train embedding file.
Using the OnionOrNot dataset from kaggle to train a binary classification model the Keras deep learning library.
Some applications of text embedding model, e.g., semantic retrieval and clustering.
Add a description, image, and links to the text-embedding topic page so that developers can more easily learn about it.
To associate your repository with the text-embedding topic, visit your repo's landing page and select "manage topics."