Web interface for recognizing text, proofreading OCR, and creating fully-digitized documents.
-
Updated
May 31, 2024 - JavaScript
Web interface for recognizing text, proofreading OCR, and creating fully-digitized documents.
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
A small lightweight HTTP server that converts photos, images and scanned documents to text using optical character recognition by utilizing the power of Google Tesseract.
CCExtractor - Official version maintained by the core team
Android document document scanning app
Web scraper for extracting data from online newspapers
Docker Image with latest Tesseract OCR Version 5.x.x built from sources
fastapi server for classification of documents and extraction of data
Tesseract based OCR for android
Tesseract Open Source OCR Engine (main repository)
⚡Extracting the Machine Readable Zone (MRZ) from passport or any document images.
6 MB Tesseract (with English training data) to fit inside AWS Lambda
Build "Dictionary of the Old Danish Language" into easier-to-use data formats
The open-sourced version of the award-winning Qiqqa research management tool for Windows (a bleeding edge dev fork) ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ☞☞☞ File any issues you find in the main repo issue tracker at https://github.com/jimmejardine/qiqqa-open-source/issues
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Add a description, image, and links to the tesseract topic page so that developers can more easily learn about it.
To associate your repository with the tesseract topic, visit your repo's landing page and select "manage topics."