Skip to content

Aadhar OCR using Python and Tesseract : Fetch name, gender, date of birth, aadhaar number, address from Aadhaar card using OCR

Notifications You must be signed in to change notification settings

anujhsrsaini/Aadhar-OCR

Repository files navigation

Logo

Aadhaar OCR using Tesseract

Table Of Contents

About The Project

This project is a Python-based tool designed to extract and digitize text information from Aadhar Cards, the unique identification cards issued by the Government of India. This project aims to facilitate the automation of data extraction from Aadhar Cards, making it easier to integrate Aadhar Card data into various applications, databases, and systems.

Developers and data analysts often need to test and develop Redash features, plugins, and customizations in a local environment before deploying to a production server. While Linux is the recommended platform for hosting Redash, this project aims to make it more accessible to Windows users for local testing and development.

**Disclaimer: The aadhaar samples used, were found on google using my internet search

Features

  • Aadhar Card Text Extraction: The project includes OCR capabilities that can accurately extract text from Aadhar Cards, including important information such as the Aadhar number, holder's name, date of birth, and address. This OCR functionality is powered by Tesseract, an open-source OCR engine known for its accuracy and versatility in text recognition.

  • Customization: Users have the flexibility to customize the OCR process to accommodate variations in Aadhar Card formats and designs.

  • Open Source: This project is open source and can be freely used, modified, and extended by the community.


Getting Started

Prerequisites

  1. Python 3.7.9 Make sure you have Python 3.7.9 installed on your system. You can download and install Python from the official Python website.
  2. Git
  3. Tesseract OCR: Tesseract is used for text extraction. Install Tesseract for your operating system by following the instructions on the Tesseract GitHub repository.

Installation

  1. Clone the repo
git clone https://https://github.com/anujhsrsaini/Aadhar-OCR.git
  1. It is recommended that you setup a completely new python environment from python 3.7.9 for this project, as the library versions in the requirements.txt may conflict with your prior installations, and install the required libraries to this environment using the below commands.
python -m venv venv
pip install -r requirements.txt
  1. Make the changes to main.py file, to include your own Tesseract path and paths to front and back of Aadhaar image you want to process. You might need to make slight change to backside image part of the code based on the format of aadhaar you are using as mentioned in the commented part of the code.

  2. Now, you can run the code and it will print out the processed information.

Authors

About

Aadhar OCR using Python and Tesseract : Fetch name, gender, date of birth, aadhaar number, address from Aadhaar card using OCR

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages