text-normalization

Star

Here are 45 public repositories matching this topic...

shihjen / NLP

Star

Jupyter Notebook for NLP Tasks

natural-language-processing regex word-cloud nltk stopwords tokenization text-normalization

Updated Jun 21, 2024
Jupyter Notebook

NVIDIA / NeMo-text-processing

Star

NeMo text processing for ASR and TTS

text-normalization inverse-text-n

Updated Jun 18, 2024
Python

areeba0 / English-to-French-Translation-using-NLTK-and-Hugging-Face-Transformers-MarianMTModel

Star

This repository provides a complete workflow for text processing using Hugging Face Transformers and NLTK. It includes modules for sentence normalization, spelling correction, word embedding generation, positional encoding computation, and English-to-French translation

python nlp word-embeddings jupyter-notebook nltk text-normalization positional-encoding huggingface-transformers english-to-french-translation

Updated Jun 18, 2024
Jupyter Notebook

kscanne / caighdean

Star

Inneall aistriúcháin atá taobh thiar de Chaighdeánaitheoir na Gaeilge, agus aistritheoirí Gàidhlig/Gaelg→Gaeilge

translation irish gaelic gaeilge text-normalization gaelg-gaeilge gaidhlig gaelg

Updated Jun 12, 2024
Perl

curegit / unicodecheck

Star

Simple tool to check if Unicode text files are Unicode-normalized

unicode character-encoding text-normalization

Updated May 31, 2024
Python

seanghay / tha

Sponsor

Star

📢 Tha (ថា) - A Khmer Text Normalization and Verbalization Toolkit

cambodia khmer text-normalization khmer-language text-verbalization

Updated May 29, 2024
Python

vn33 / Intensity-Analysis-EmotionClassification

Star

Predict emotions (happiness, anger, sadness) from WhatsApp chat data using machine learning and deep learning models. Includes text normalization, vectorization (TF-IDF, BoW, Word2Vec, GloVe), and model evaluation.

machine-learning natural-language-processing deep-learning text-classification word2vec hyperparameter-tuning bidirectional-lstm countvectorizer glove-embeddings text-normalization emotion-classification tf-idf-vectorizer word2vec-embeddinngs

Updated May 28, 2024
Jupyter Notebook

vn33 / Ecommerce-Product-Categorization

Star

Accurate categorization of eCommerce products improves user experience and boosts search engine visibility. The project goal is to classify products into 14 predefined categories using their descriptions sourced from an eCommerce platform.

natural-language-processing ecommerce text-classification text-normalization streamlit-webapp

Updated May 19, 2024
Jupyter Notebook

Aayshashukla / SentimentAnalysis

Star

Twitter Sentiment Analysis using Natural Language Processing(NLP)

python nlp text-mining text-classification kaggle artificial-intelligence logistic-regression nlp-machine-learning twitter-data text-normalization

Updated May 17, 2024
Jupyter Notebook

csebuetnlp / normalizer

Star

This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.

text-processing text-normalization text-preprocessing bangla-text-normalization bengali-text-normalization

Updated May 7, 2024
Python

ikegami-yukino / neologdn

Sponsor

Star

Japanese text normalizer for mecab-neologd

nlp japanese-language preprocessing mecab-ipadic-neologd text-normalization

Updated May 2, 2024
Cython

Aalaa4444 / Text_Processing-and-Unique_Word_Extraction_fromHTML

Star

Extract text content from an HTML page, process it, and extract unique words from the processed text. This notebook utilizes various text processing techniques including cleaning, normalization, tokenization, lemmatization or stemming, and stop words removal.

tokenizer text-extraction requests data-extraction beautifulsoup text-processing tokenization stemming lemmatization stopwords-removal text-cleaning text-normalization extract-html text-tokenization text-lemmatization

Updated Apr 5, 2024
Jupyter Notebook

sugatagh / E-commerce-Text-Classification

Star

Proper categorization of e-commerce products enhances the user experience and achieves better results with external search engines. The objective of the project is to classify a product into four given categories, based on its description available on an e-commerce platform.

natural-language-processing text-classification word2vec e-commerce tf-idf text-normalization product-categorization