Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

threshold.Silence fails for batched signals #19

Open
vicpc00 opened this issue Dec 24, 2022 · 1 comment
Open

threshold.Silence fails for batched signals #19

vicpc00 opened this issue Dec 24, 2022 · 1 comment

Comments

@vicpc00
Copy link

vicpc00 commented Dec 24, 2022

threshold.Silence crashes if the batch dimension of the input tensor is > 1. It seems to be because loudness is calculated by converting tensors to numpy and using librosa. The line that causes the actual crash is line 37 from loudness.py, which tries to squeeze the 0th dimension.

Librosa doesn't support batches, but since calculating the loudness only involves STFT, logs and means, it doesn't seems hard change it to use the torch version of these functions.

Since the rest of the package (at least seems to) works fine with batched inputs, this might be a worthwhile change.

@maxrmorrison
Copy link
Owner

Given a single audio file (batch size always 1), CREPE creates many batches via chunking. Then, once you have the entire pitch and periodicity sequence for the audio file, you decode and threshold as needed. But given that the input is of shape (1, samples) and not (batch, samples), thresholding and decoding should operate on the actual output of that batched process, which is (1, int(1 + samples // hopsize)). So you'll never have a batch size greater than one for the output pitch and periodicity, even if you use multiple audio files. All batching is done internally. And it's rare to pass in multiple audio files at once as (batch, samples) as it assumes that all items in the batch have the same number of samples.

What you are mentioning is useful for speed-ups--computing loudness on GPU could be nice for that. However, the loudness is extremely quick to compute, so it's not a priority of mine. And I have a solution coming soon that handles silence without that additional step =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants