New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple Dataset generator? #122
Comments
the validation file should have the same format as sample_dataset.csv ... once you generate a whole dataset, and have split it into a large training set and small validation set manually, you can then place respective file ids into the csvs |
@vatsalaggarwal Really not sure what that means... |
This comment was marked as off-topic.
This comment was marked as off-topic.
Hey @MethanJess, sorry for the late reply, I've just followed a similar process as pointed out by @vatsalaggarwal for putting together the datasets but I don't have any special generators of my own. If you're running into any issues in putting together a useful data pipeline, let us know & we'll see if we can help! |
Hey @lucapericlp I found this repository: https://github.com/daswer123/xtts-webui |
Hi, I already know there's Speech Dataset Generator
However, it's way too bloated with features and I couldn't get it to work on my system.
So, does anyone have a simple script that splits an audio file into segments, and converts the audio into to the right sample rate, then uses WhisperX large-v3 to transcribe the segments to make "sample_dataset.csv", and "sample_val_dataset.csv"? (and anything else if there's any)?
I tried making my own but I have no idea how to make the validation file thing...
The text was updated successfully, but these errors were encountered: