Skip to content

Text recognition (optical character recognition) with deep learning methods in farsi.

License

Notifications You must be signed in to change notification settings

Saeed-Biabani/Scene-Text-Recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scene Text Recognition

Scene Text Recognition With Deep Learning Methods In Farsi.

Quick Links

Dependencies

  • Install Dependencies $ pip install -r requirements.txt
  • Download Pretrained Weights Here

Getting Started

Fig. 1: Model architectur.

  • Project Structure
.
├── src
│   ├── nn
│   │   ├── feature_extractor.py
│   │   ├── layers.py
│   │   └── ocr_model.py
│   └── utils
│       ├── dataset.py
│       ├── labelConverter.py
│       ├── loss_calculator.py
│       ├── misc.py
│       ├── trainUtils.py
│       └── transforms.py
├── config.py
└── train.py
  • place dataset path in config.py file.
ds_path = {
    "train_ds" : "path/to/train/dataset",
    "test_ds" : "path/to/test/dataset",
}
  • DataSet Structure (each image must eventually contain a word)
.
├── Images
│   ├── img_1.jpg
│   ├── img_2.jpg
│   ├── img_3.jpg
│   ├── img_4.jpg
│   └── img_5.jpg
│   ...
└── labels.json
  • labels.json Contents
{"img_1": "بالا", "img_2": "و", "img_3": "بدانند", "img_4": "چندین", "img_5": "به", ...}

Overview

Training

Objective Function

Denote the training dataset by $\ TD = \langle X_i , Y_i \rangle$ where $\ X_i$ is the training image and $\ Y_i$ is the word label. The training conducted by minimizing the objective function that negative log-likelihood of the conditional probability of word label.

$$O = -\sum_{(X_i, Y_i) \in TD} \log P(Y_i|X_i)$$

This function calculates a cost from an image and its word label, and the modules in the framework are trained end-to-end manner.

Fig. 1: Model Training History.

CTC Loss

CTC takes a sequence $\ H = h_1 , . . . , h_T$ , where $\ T$ is the sequence length, and outputs the probability of $\ \pi$, which is defined as

$$P(\pi|H) = \prod_{t = 1}^T y_{{\pi}_t}^t$$

where $\ y_{{\pi}_t}^t$ is the probability of generating character $\ \pi_t$ at each time step $\ t$.

Model Input Size Recall Precision F1 Params Speed(img/s)
$\ OCR-Base$ $\ 1$ $\ \times$ $\ 64$ $\ \times$ $\ 192$ $\ 0.993$ $\ 0.997$ $\ 0.997$ $\ 35,023,143$ $\ 89.24$

Samples

References

🛡️ License

Project is distributed under MIT License

About

Text recognition (optical character recognition) with deep learning methods in farsi.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages