Skip to content

mkczyk/ocr-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCR examples

OCR examples with Tesseract

Features

Application that uses Tesseract and Tess4J to provide REST API for testing various options. Additionally, some snippets (very simple examples) in tests.

API for testing Tesseract

  • OCR image by providing absolute path to file
  • OCR image by sending file
  • Selecting Tesseract engine mode and page segmentation mode
  • Return result in text or HOcr
  • Specifying languages (missing dictionaries will be automatically downloaded)
  • Saving a file after OCR (text file, PDF with text layer)

Look at Swagger for details: http://localhost:8080/swagger-ui/

You have to have running application locally - see below.

Swagger endpoints

Swagger tess-ocr-text ocrByImage

Examples of usage

  • The simplest usage of Tesseract
  • Generating HOcr
  • OCR from PDF file using PDFBox

Look at test folder for details: pl.marcinkowalczyk.ocr.examples.tesseract.

Running

Prerequisites: installed JDK 11 (you can use AdoptOpenJDK).

  1. Build application:
    • Linux: ./mvnw clean package
    • Windows: mvnw.cmd clean package
  2. Run application: java -jar target/ocr-examples-0.0.1-SNAPSHOT.jar
  3. Open browser with URL: http://localhost:8080/

Usage

Simple usage

Send HTTP request. Provide an absolute path to an image file for OCR as a parameter.

http://localhost:8080/api/tess/path?absolute=<absolute_path_to_image_file>

Example:

http://localhost:8080/api/tess/path?absolute=C:/dev/ocr/ocr-examples/src/test/resources/test_image.png

Swagger

For more endpoints and parameters explore Swagger: http://localhost:8080/swagger-ui/

Releases

No releases published

Packages

No packages published

Languages