Skip to content

Midterm Project for DATA 2040 @ Brown University | Data Science Initiative

Notifications You must be signed in to change notification settings

reach2sayan/Bengali-Grapheme_DATA2040

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Team Onubaad - Bengali Handwritten letter Classification

This project was conducted by Wanxin Ye and Sayan Samanta, for partial fulfillment of the credit requirements for DATA 2040 - Hands on Deep Learning, working towards a masters degree in Data Science from the Data Science Initiative | Brown University.

Bengali (used synonymous to Bangla) ranks in the top 5 spoken languages with more than hundreds of millions of speakers. In this repository, we attempt to classify bengali handwritten character into its 3 structural component - Grapheme root, Vowel diacritic and Consonant diacritic.

The problem statement and the data is obtained from the kaggle competition.


A detailed description of the work can be found in the series of medium post:

  1. Bangla Character Recognition System - The Deep Learning way (1/n)
  2. Bangla Character Recognition System - The Deep Learning way (2/n)
  3. Bangla Character Recognition System - The Deep Learning way (3/n)

Here we do give a brief description about the content of the repository without going in to the implementation details or data description. You are requested to read the blog posts for further details.


Major Notebooks:

  1. Image Processing - Contains code to De-Noise, Threshold and Crop based on Contours.
  2. Data Distribution - Primary EDA notebook. Contains all analysis of distribution and balance of different target classes in the dataset
  3. Augmentation,Generators,Architectures_2_colab - Main notebook. Contains code for different architectures that were tried - AlexNet, BengaliNet (the kernel shared on piazza), ResNet (mini), InceptionNet (mini), InceptualNet (Inception + Residual net), FractalNet and DenseNet (mini) (all in-house implementations)

Note. there are redundant copies of similar named files. We had to shift to Google Colab and decided to load entire pre-processed dataset to RAM instead of input pipelines, hence the redundancy.

Also, in the notebook Augmentation,Generators,Architectures, we have the baseline models tested on ResNet50, VGGNet, XceptionNet and InceptionNet with pretrained weights.

For further queries you can email me at [email protected]

Releases

No releases published

Packages

No packages published