-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
threshold.Silence fails for batched signals #19
Comments
Given a single audio file (batch size always 1), CREPE creates many batches via chunking. Then, once you have the entire pitch and periodicity sequence for the audio file, you decode and threshold as needed. But given that the input is of shape What you are mentioning is useful for speed-ups--computing loudness on GPU could be nice for that. However, the loudness is extremely quick to compute, so it's not a priority of mine. And I have a solution coming soon that handles silence without that additional step =) |
threshold.Silence crashes if the batch dimension of the input tensor is > 1. It seems to be because loudness is calculated by converting tensors to numpy and using librosa. The line that causes the actual crash is line 37 from loudness.py, which tries to squeeze the 0th dimension.
Librosa doesn't support batches, but since calculating the loudness only involves STFT, logs and means, it doesn't seems hard change it to use the torch version of these functions.
Since the rest of the package (at least seems to) works fine with batched inputs, this might be a worthwhile change.
The text was updated successfully, but these errors were encountered: