Skip to content

Low Rank Adaptation Tuning (LoRA) of Whisper-large-v2 for subtitle generation of Bengali YouTube videos.

License

Notifications You must be signed in to change notification settings

ucalyptus/bongovaad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bongovaad (বঙ্গবাদ) PyPI version

bongovaad is a Python package that provides functionality for transcribing audio from YouTube videos. It utilizes the powerful ASR (Automatic Speech Recognition) model provided by the Whisper library from Hugging Face.

Features

  • We have already LoRA-tuned the whisper-large-v2 model on the 'bn' subset of Mozilla Common Voice 13, obtaining a Word-Error-Rate(WER) of 57, compared to the WER of 103.4 obtained by the original OpenAI paper(Page 23). More information available here.
  • Handles audio segmentation for longer videos using AudioSegment.

ToDo

  • Automatic SRT file creation

Installation

Before using bongovaad, you need to install ffmpeg:

sudo apt install ffmpeg -y

To install bongovaad, you can use pip: pip install bongovaad

Usage

bongovaad provides a command-line interface (CLI) that allows you to transcribe audio from YouTube videos. Here's how to use it:

bongovaad --url <youtube_url>

Replace <youtube_url> with the actual YouTube URL of the video you want to transcribe. The output will be written to text files containing the transcriptions of the audio segments.

Example

bongovaad --url https://www.youtube.com/watch?v=ABC12345

This command transcribes the audio from the YouTube video with the specified URL (https://www.youtube.com/watch?v=ABC12345).

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Contributing

Contributions are welcome! Please refer to the contributing guidelines for more information. If you encounter any issues or have suggestions for improvements, please create a new issue on the GitHub repository.

Acknowledgements

About

Low Rank Adaptation Tuning (LoRA) of Whisper-large-v2 for subtitle generation of Bengali YouTube videos.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages