#

scanned-documents

Here are 47 public repositories matching this topic...

papermerge / papermerge-core

In this repository is the source code of Papermerge DMS backend core, REST API server, and frontend UI

pdf ocr documents scanned-documents dms records-management digital-archives document-management-system

Updated May 24, 2024
Python

ispras / dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser

html pdf ocr table-of-contents excel html-parser docx documents doc scanned-documents txt document-analysis odt pdf-parser table-recognition docx-parser document-content-extraction logical-structure-extraction

Updated May 23, 2024
Python

Udayraj123 / OMRChecker

Evaluate OMR sheets fast and accurately using a scanner 🖨 or your phone 🤳.

Updated May 22, 2024
Python

ad-si / awesome-scanning

A curated list of awesome projects to simplify and improve paper and document scanning.

scanner scanned-documents dms document-scanner scanning book-scanning book-scanner digitization book-digitization page-scanning

Updated Apr 19, 2024

MaxineXiong / Scraping-Scanned-PDF-Docs-using-OCR-with-RPA

This repository contains automation solutions that efficiently extracts text from scanned PDF documents with consistent layouts. Utilizing Tesseract OCR engine, the UiPath RPA robot achieves nearly 90% accuracy, streamlining the process and significantly reducing manual workload.

ocr scanned-documents optical-character-recognition screen-scraping rpa robotic-process-automation uipath uipath-studio scanned-receipts uipath-modern-design uipath-classic-design

Updated Apr 17, 2024

papermerge

ciur / papermerge

Open Source Document Management System for Digital Archives (Scanned Documents)

pdf django ocr archives scan scanned-documents dms document-management paperless

Updated Apr 7, 2024
Python

papermerge / documentation

Documentation for Papermerge DMS - Installation, Help, User Manual, REST API

documentation ocr archives help scan installation scanned-documents dms document-management user-manual contrbuting

Updated Apr 7, 2024
HTML

papermerge / papermerge-cli

Papermerge DMS command line utility

pdf ocr archive command-line-tool scanned-documents dms records-management document-management-system papermerge

Updated Feb 22, 2024
Python

legenscandary / scan

An automatic scan server software for scanners with document feeder. It creates multi-page PDFs with selectable text (OCR) by just one button press.

pdf ocr samba scanned-documents shell-scripts pdf-generation scanning

Updated Feb 19, 2024
Shell

atgreen / paperless

Emacs-assisted PDF document filing

pdf emacs melpa scanned-documents paperless

Updated Jan 30, 2024
Emacs Lisp

deckerego / docmag

The web UI for Facile Search. Together with DocIndex, this UI can help you search the myriad of scanned documents you have been accumulating over the years. Using the power of Docker & Elasticsearch you can run a powerful search engine that lets you convert scanned (image-based) PDFs to searchable text, group documents by letterhead, run fuzzy s…

docker kubernetes pdf elasticsearch full-text-search scanned-documents

Updated Oct 26, 2023
Groovy

baltpeter / scanprep

Small utility to prepare scanned documents. Supports separating PDF files by separator pages and removing blank pages.

pdf image-processing scanned-documents scanning hacktoberfest

Updated May 6, 2024
Python

svitlana1209 / OCR-search

Searching for a text using OCR, detection and recognition of tables in scanned documents.

python pdf opencv image ocr computer-vision pandas-dataframe tesseract text-recognition scanned-documents hough-transform contour-detection pytesseract angle-rotation detect-table-struct

Updated Oct 23, 2023
Python

deckerego / docidx

A document indexing daemon that can populate Elasticsearch indexes with the contents and metadata of a number of document types including PDF, image scans, etc. Used to power Facile Search, however can be re-used for anything that requires search indexing for scanned documents.

search-engine elasticsearch full-text-search scanned-documents pdf-search

Updated Oct 17, 2023
Java

4lex4 / scantailor-advanced

ScanTailor Advanced is the version that merges the features of the ScanTailor Featured and ScanTailor Enhanced versions, brings new ones and fixes.

image-processing djvu scanned-documents book-scanning binarization digitalization

Updated Sep 13, 2023
C++

Viscomsoft / Scanner-TWAIN-SDK-ActiveX

For Windows Developers who need to capture image from scanner, digital camera or capture card that has a TWAIN device driver with C++, C#, VB.NET , VB, Delphi, Vfp, MS Access.

sdk csharp dotnet scanner activex scanned-documents scannerbarcode twain twain-operation vbtwain

Updated May 22, 2023
Visual Basic .NET

binDebug3 / scanner_automation

A program to automate simple and repetitive tasks while scanning documents by Dallin Stewart

automation data-entry scan-tool scanned-documents mortgage pyautogui pyautogui-automation

Updated May 12, 2023
Python

ahmetozlu / signature_extractor

A super lightweight image processing algorithm for detection and extraction of overlapped handwritten signatures on scanned documents using OpenCV and scikit-image.

ocr image-processing scanned-documents image-segmentation optical-character-recognition signature-verification ocr-engine signature-recognition signature-detection handwritten-signatures signature-extractor signature-extraction-algorithm

Updated Apr 20, 2023
Python

karolzak / boxdetect

BoxDetect is a Python package based on OpenCV which allows you to easily detect rectangular shapes like character or checkbox boxes on scanned forms.

opencv computer-vision forms checkbox documents checkboxes scanned-documents boxes handwritten-documents cv2 opencv-python bounding-boxes box-detection scanned-images rectangle-detection handwritten-character-recognition handwritten-characters scanned-image-pdfs handwritten-forms

Updated Jan 18, 2023
Python

hnjm / papermerge

Open Source Document Management System for Digital Archives (Scanned Documents)

python pdf django ocr archives scan scanned-documents dms document-management paperless hnjm

Updated Jan 5, 2023
Python

Improve this page

Add a description, image, and links to the scanned-documents topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the scanned-documents topic, visit your repo's landing page and select "manage topics."