BatchWhisper-Transcription-Translation (CPU & GPU Supported)

This Python script is designed to automate the process of translating or transcribing audio files into different languages. This script uses the Whisper API to perform the translations and transcriptions.

Prerequisites

Python 3.8 or higher
Works With Device or through API
If you are going to use the API Please use the System Environment Variables: OPENAI_API_KEY for the API KEY.

Installation

Clone this repository: git clone https://github.com/TtesseractT/BatchWhisper-Transcription-Translation
cd BatchWhisper-Transcription-Translation
For fresh install inc conda, cuda, python, pytorch (GPU) Please run windows_setup.bat
Run python Run.py --type {Type Number}

Argument	Description
--type 1	Text to Audio Segments
--type 2	Text to Audio Segments with Translation
--type 3	Audio Translation (CPU)
--type 4	Audio Translation (GPU)
--type 5	Audio Transcription (CPU)
--type 6	Audio Transcription (GPU)

Usage

Supported Input File Types:

Format	Description	Format	Description	Format	Description
3GP	Mobile Phone Video	AAC	Advanced Audio Codec	AC3	Audio Codec 3
AIF, AIFF	Audio Interchange File Format	AMR	Adaptive Multi-Rate Audio Codec	APE	Monkey's Audio Format
ASF	Advanced Streaming Format	AVI	Audio Video Interleaved Format	CAF	Core Audio Format
DTS	Digital Theater Systems Audio	FLAC	Free Lossless Audio Codec	M4A, M4B	MPEG-4 Audio Layer
MIDI	Musical Instrument Digital Interface	MKV	Matroska Multimedia Container	MOV	Apple QuickTime Movie
MP4	MPEG-4 Part 14 Container	MPEG	Moving Picture Experts Group Video	OGA, OGG	Ogg Vorbis Audio
RA	RealAudio	RM	RealMedia	WAV	Waveform Audio Format
WebM	Web Media Format	WMA	Windows Media Audio	WV	WavPack Audio Format
AVCHD	Advanced Video Codec High Definition	DV	Digital Video Format	FLV	Flash Video Format
M2TS, MTS	MPEG-2 Transport Stream	MJPEG	Motion JPEG Video Format	MPEG-1	Moving Picture Experts Group Video
MPEG-2	Moving Picture Experts Group Video	MPEG-4	Moving Picture Experts Group Video	RMVB	RealMedia Variable Bitrate Format
SWF	Shockwave Flash Movie	VOB	DVD Video Object	WMV	Windows Media Video

Supported Languages:

Language
Afrikaans	Albanian	Amharic	Arabic	Armenian	Assamese
Azerbaijani	Bashkir	Basque	Belarusian	Bengali	Bosnian
Breton	Bulgarian	Burmese	Castilian	Catalan	Chinese
Croatian	Czech	Danish	Dutch	English	Estonian
Faroese	Finnish	Flemish	French	Galician	Georgian
German	Greek	Gujarati	Haitian	Haitian Creole	Hausa
Hawaiian	Hebrew	Hindi	Hungarian	Icelandic	Indonesian
Italian	Japanese	Javanese	Kannada	Kazakh	Khmer
Korean	Lao	Latin	Latvian	Letzeburgesch	Lingala
Lithuanian	Luxembourgish	Macedonian	Malagasy	Malay	Malayalam
Maltese	Maori	Marathi	Moldavian	Moldovan	Mongolian
Myanmar	Nepali	Norwegian	Nynorsk	Occitan	Panjabi
Pashto	Persian	Polish	Portuguese	Punjabi	Pushto
Romanian	Russian	Sanskrit	Serbian	Shona	Sindhi
Sinhala	Sinhalese	Slovak	Slovenian	Somali	Spanish
Sundanese	Swahili	Swedish	Tagalog	Tajik	Tamil
Tatar	Telugu	Thai	Tibetan	Turkish	Turkmen
Ukrainian	Urdu	Uzbek	Valencian	Vietnamese	Welsh
Yiddish	Yoruba

Supported Output file type [3, 4, 5, 6]:

Text Format (txt)

Json Format (json)

WebVTT Format (vtt)

SubRip Subtitle Format (srt)

Tab Separated Values Format (tsv)

To use this script, follow these steps:

Place your audio files in the Input-Videos directory.
Run the script using the following command: python Run.py --type <process-type>

Replace <process-type> with the type of process you want to run (1 to 6). The available process types are:

Argument	Description
--type 1	Text to Audio Segments
--type 2	Text to Audio Segments with Translation
--type 3	Audio Translation (CPU)
--type 4	Audio Translation (GPU)
--type 5	Audio Transcription (CPU)
--type 6	Audio Transcription (GPU)

If you choose process types 3, 4, 5, or 6, you will be prompted to select a language and an output format.

The output files will be saved in the Videos directory.

USAGES FOR WHISPER DEVELOPER BACKEND*

Argument
--model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large}
--model_dir MODEL_DIR
--device DEVICE
--output_dir OUTPUT_DIR
--output_format {txt,vtt,srt,tsv,json,all}
--verbose VERBOSE
--task {transcribe,translate}
--temperature TEMPERATURE
--best_of BEST_OF
--beam_size BEAM_SIZE
--patience PATIENCE
--length_penalty LENGTH_PENALTY
--suppress_tokens SUPPRESS_TOKENS
--initial_prompt INITIAL_PROMPT
--condition_on_previous_text CONDITION_ON_PREVIOUS_TEXT
--fp16 FP16
--temperature_increment_on_fallback TEMPERATURE_INCREMENT_ON_FALLBACK
--compression_ratio_threshold COMPRESSION_RATIO_THRESHOLD
--logprob_threshold LOGPROB_THRESHOLD
--no_speech_threshold NO_SPEECH_THRESHOLD
--word_timestamps WORD_TIMESTAMPS
--prepend_punctuations PREPEND_PUNCTUATIONS
--append_punctuations APPEND_PUNCTUATIONS
--threads THREADS

Input / Output structure

EXPECTED INPUT - [ROOT DIR]		EXPECTED OUTPUT - [ROOT DIR]
Folder	File	Folder	File
Input-Videos		Videos
	Video 1	Video -1
	Video 2		Video 1 - File
	Video 3		Transcription File
	Video 4		Audio Segment - File
	...	Video -2
	Video [N]		Video 2 - File
			Transcription File
			Audio Segment - File

License

This project is licensed under the terms of the MIT license. See LICENSE for more information.

Author

Built by Sabian Hibbs.

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
.github/workflows		.github/workflows
.vscode		.vscode
.dockerignore		.dockerignore
.gitignore		.gitignore
Available_Input_Files.txt		Available_Input_Files.txt
Batch_v2.py		Batch_v2.py
CleanUp.py		CleanUp.py
LICENSE		LICENSE
README.md		README.md
Run.py		Run.py
Setup-Rosetta.py		Setup-Rosetta.py
Setup.py		Setup.py
Text_AudioSegments.py		Text_AudioSegments.py
Text_AudioSegments_Translate.py		Text_AudioSegments_Translate.py
batch_GPU.py		batch_GPU.py
batch_run.py		batch_run.py
cleanup_post.py		cleanup_post.py
conv_file.py		conv_file.py
language_dict.py		language_dict.py
length_finder.py		length_finder.py
os_setup.py		os_setup.py
output_format_type.py		output_format_type.py
timed_proc.py		timed_proc.py
timed_proc_Fast.py		timed_proc_Fast.py
windows_setup.bat		windows_setup.bat

License

TtesseractT/BatchLMT2

Folders and files

Latest commit

History

Repository files navigation

BatchWhisper-Transcription-Translation (CPU & GPU Supported)

Prerequisites

Installation

Usage

USAGES FOR WHISPER DEVELOPER BACKEND*

Input / Output structure

License

Author

About

Topics

Resources

License

Stars

Watchers

Forks

Languages