Project under construction. First complete release early May 2024 🚧👷♂️
🟡 In some OS dark theme template of pyqtdarktheme is not found.
🟡 Audio capture mistake (first 0.2 seconds are corrupted, I believe it is due to audio driver latency or python multithread erratic behavior).
🔴 HF export not complete ´, including HF.
With the rising number of ASR/NLP open source projects democratizing the AI human-machine interface comes with the necessity of getting better ASR datasets. <Whisper Temple delves creates a simple-to-use platform to create Synthetic speech datasets, creating pairs of audio and text. Translation powered by faster-whisper ⏩ Synthetic translations can be edited in UI Data viewer. User interface is created using PyQt5 and runs totally on local machine.
This application serves as a Synthetic Speech Generator, enabling users to transcribe captured audio and manage generated datasets. It provides a user-friendly interface for configuring audio parameters, transcription options, and dataset management.
- Audio Capture: Users can capture audio samples with customizable settings such as sample rate and duration.
- Transcription: Provides the option to transcribe captured audio into text.
- Audio Metadata: Allows to add metadata to dataset, such as audio sample rate and duration.
- Dataset Management: Enables users to view, delete, and manage entries in the generated dataset.
- Export: Allows exporting of the dataset for further processing or Hugging Face 🤗.
Adding metadata to each dataset entry, audio sample rate, length, or speaker gender and age. ReadMe update with UI screenshot and video
First, I suggest you create and activate a new virtual environment using conda
or virtenv
. Then follow steps ⬇️
- Clone the repository:
https://github.com/gongouveia/Syntehtic-Speech-Dataset-Generator.git
- Install dependencies in requirements.txt
- Follow instruction in Usage Section.
- Launch the application and create or continue a project by running
python speech_gen.py --project <project_name> --theme <default:'auto', 'light', 'dark'>,
- Configure audio capture parameters such as sample rate in KHz
default: 16000
and duration in millisecondsdefault: 5000
. - If CUDA is found, it is possibel to transcribe audio records at the ed of each recording. Otherwise, yo can batch transcribe the audios in the DatasetViewer..
- Choose whether to use VAD option in transcripion or not, default is enabled and allows for a faster trancription.
- Click on "Capture Audio" to start a new audio recording.
- View and manage Audio dataset using provided menu options.
- Edit weak Transcriptions, creating a even more robust training dataset for Whisper.
If the Idiom argument is set to ('en'), the languages dropdown menu is not available.
If option 3 is disabled, it is possible to transcribe all the captured audios in the dataset viewer window. You can add audios to the Audio dataset by pasting them in the /Audios
folder under your desired project.
- Audio Sample Rate: Set the sample rate for audio capture (in KHz).
- Audio Duration: Define the duration of audio samples to capture (in milliseconds).
- Transcribe: Choose whether to transcribe captured audio (Yes/No).
- VAD: Enable or disable VAD in transcription (Yes/No).
- View Dataset: Opens a new window to view the generated dataset.
- Refresh Dataset: Refreshes dataset, use if changed metadata.csv.
- Delete Entry: Deletes the last recorded entry from the dataset.
To export the final dataset as a Hugging Face 🤗 Datasets, use the Command-Line Interface (CLI) provided. [https://huggingface.co/docs/datasets/audio_dataset]
You can log in to UI by providing the hf token [https://huggingface.co/docs/hub/security-tokens].
Dependinfg on community or necessity, this features will be merged:
- Adding a new translation engine or more translation configuration options;
- Adding more metadata to the Dataset, such as speaker and file type information;
- Export as kaldi ☕ dataset format;
- Adding a loading bars for the dataset batch translation;
- New window to train whisper with the new pseudo-synthetic dataset (on-request, contact me if you need this solution).
Contributions to this project are welcome! If you'd like to contribute, please follow the standard GitHub workflow:
- Fork the repository.
- Create a new branch for your feature (
git checkout -b feature/your-feature
). - Commit your changes (
git commit -am 'Add some feature'
). - Push to the branch (
git push origin feature/your-feature
). - Create a new Pull Request.
For any inquiries or collaboration, please contact [[email protected]]. I would be thankful to be cited in created datasets.