node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
-
Updated
May 21, 2024 - HTML
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
Apache Tika - Toolkit detects and extracts metadata
Text Processing & Segmentation Framework
Extract specific paragraphs out of Joplin notes using keywords, hashtags or custom tags
C# and VB.NET samples for Docotic.Pdf library
Extract text from a document by Apache Tika
VNDB explorer and VNR-like text hooker.
Multiple and Large PDF Documents Text Extraction.
view pdf on X11 and the Linux framebuffer; resize pdf; convert pdf to text, html, TeX, groff
Extract text from plaintext, .docx, .odt and .rtf files. Pure go.
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
Extract Text and Data from Document with OCR NER
Twitter text processing library (auto linking and extraction of usernames, lists and hashtags). Based on the Ruby and Java implementations by Matt Sanford
Free PHP library to extract the main content from an article post or news post, including images and HTML
An R package to extract text from pdf.
R Interface to Apache Tika
Build search across multiple documents client-side in your file storage
Scripts engineered for R&D to extract text from audio, video, and websites necessary to improve their 'Unfold' app algorithm
Add a description, image, and links to the extract-text topic page so that developers can more easily learn about it.
To associate your repository with the extract-text topic, visit your repo's landing page and select "manage topics."