-
-
Notifications
You must be signed in to change notification settings - Fork 988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keeping the same speaker in different files #777
Comments
wallaceblaia
changed the title
Mantendo o mesmo falante em arquivos diferente
Keeping the same speaker in different files
Apr 11, 2024
Hey @wallaceblaia could you find any solution for this? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
First off, thank you for your fantastic work here. I am working on a project where I aim to translate and dub YouTube live streams almost in real-time. I've managed to achieve a delay of 3 minutes, but I'm looking to reduce this even further.
In my implementation, I capture the live stream and create segments of approximately 1 minute each because I use a technique to make cuts in speech only between words. After processing this audio with Demucs, I send it to the Whisperx pipeline. However, the speaker data varies across each audio file. I am interested in knowing if there is a way to preserve the embedding data of speakers across multiple audio files, with the same speakers, because I use the flags returned from diarization to dub in another language. But in each audio, I would have different flags for the same speaker.
The text was updated successfully, but these errors were encountered: