Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
-
Updated
May 18, 2024 - Python
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法
🧹 Python package for text cleaning
Tools for cleaning and normalizing text data
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
Grammarify is a npm package that safely cleans up text that has mispellings, improper capitalization, lexical illusions, among other things.
NLP预/后处理工具。
Text preprocessing tools in python.
A Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。
Dataiku DSS plugin to detect languages, correct misspellings, and clean text data 🧼
Text Preprocessing Package includes cleaning, tokenization, dataset preparation ...etc
Korean text data preprocess toolkit for NLP
A Python package to get useful information from documents using TopicRank Algorithm.
Text preprocessing package for use in NLP tasks https://pypi.org/project/textcl/
JS / Python3 / PHP Lib to work with UTF8 polytonic greek and latin
4th place (top 1%) solution for Shopee Code League 2020 - Product Detection
Common Text Pre-Processing for Portuguese
Remove extra whitespace from text.
Text preprocessing in Python. Libs include string, re, nltk, spacy, gensim, textblob, unidecode, autocorrect, pyspellchecker
🖹 Offline Text Cleaner and Formatter
Add a description, image, and links to the text-cleaning topic page so that developers can more easily learn about it.
To associate your repository with the text-cleaning topic, visit your repo's landing page and select "manage topics."