malayalam_asr_benchmarking

Objective of the project

Note

A study to benchmark ASRs in Malayalam. Till now the project has benchmark based on Malayalam ASR models based in Whisper ASR and faster-whisper ASR.

Benchmarked Datasets

Till now we have mainly benchmarked on two datasets:

Common Voice 11 Dataset

I have now done benchmarking on Mozilla’s Common Voice 11 Malayalam subset. The benchmarking results can be found in the below dataset.

Malayalam Speech Corpus

I have now benchmarked on SMC’s Malayalam Speech corpus dataset. The benchmarking results can be found in the below dataset.

Install

pip install malayalam_asr_benchmarking

or from github repository

# Ensure git is installed, else install it. Eg: In ubuntu via apt install git
pip install git+https://github.com/kurianbenoy/malayalam_asr_benchmarking.git

Or locally

# Ensure git is installed, else install it. Eg: In ubuntu via apt install git
git clone https://github.com/kurianbenoy/malayalam_asr_benchmarking.git
cd malayalam_asr_benchmarking
pip install -e .

Setting up your development environment

I am developing this project with nbdev. Please take some time reading up on nbdev … how it works, directives, etc… by checking out the walk-thrus and tutorials on the nbdev website

Step 1: Install Quarto:

nbdev_install_quarto

Other options are mentioned in getting started to quarto

Step 2: Install hooks

nbdev_install_hooks

Step 3: Install our library

pip install -e '.[dev]'

How to use

Evaluate Whisper-based Malayalam ASR models

from malayalam_asr_benchmarking.commonvoice import evaluate_whisper_model_common_voice

werlist = []
cerlist = []
modelsizelist = []
timelist = []

evaluate_whisper_model_common_voice("parambharat/whisper-tiny-ml", werlist, cerlist, modelsizelist, timelist)

from malayalam_asr_benchmarking.msc import evaluate_whisper_model_msc

werlist = []
cerlist = []
modelsizelist = []
timelist = []

evaluate_whisper_model_msc("parambharat/whisper-tiny-ml", werlist, cerlist, modelsizelist, timelist)

Evaluate faster-whisper based models

from malayalam_asr_benchmarking.commonvoice import evaluate_faster_whisper_model_common_voice

werlist = []
cerlist = []
modelsizelist = []
timelist = []

evaluate_faster_whisper_model_common_voice("kurianbenoy/vegam-whisper-medium-ml", werlist, cerlist, modelsizelist, timelist)

from malayalam_asr_benchmarking.msc import evaluate_faster_whisper_model_msc

werlist = []
cerlist = []
modelsizelist = []
timelist = []

evaluate_faster_whisper_model_msc("kurianbenoy/vegam-whisper-medium-ml", werlist, cerlist, modelsizelist, timelist)

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.github		.github
malayalam_asr_benchmarking		malayalam_asr_benchmarking
nbs		nbs
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CHANGELOG.mdy		CHANGELOG.mdy
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
settings.ini		settings.ini
setup.py		setup.py
whisper_malayalam_benchmarking.xlsx		whisper_malayalam_benchmarking.xlsx

License

kurianbenoy/malayalam_asr_benchmarking

Folders and files

Latest commit

History

Repository files navigation

malayalam_asr_benchmarking

Objective of the project

Benchmarked Datasets

Install

Setting up your development environment

Step 1: Install Quarto:

Step 2: Install hooks

Step 3: Install our library

How to use

Evaluate Whisper-based Malayalam ASR models

Evaluate faster-whisper based models

About

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Languages