-
Notifications
You must be signed in to change notification settings - Fork 930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constant Q transforms do not allow STFT operations without zero padding for chunked audio processing. #1788
Comments
This is quite a bit more complex than it is in, say, stft. The core VQT function has a recursive downsampling step for computing the response at each octave (wavelet pyramid). Most of the resamplers we support do not provide a streaming / stateful block-processing API, so there isn't an obvious way to thread a stream generator through the function. One could imagine just working with blocks and chopping off the padding cleverly like we do for STFT. However, this will generally not provide consistent results compared to a full signal analysis because the resamplers we use typically have zero phase (which is good for our purposes), and this tends to imply at least some forward lookahead if not a fully bidirectional pass. (For context, the original matlab code used a bidirectional butterworth filter for this, so each downsample depended on the entire signal.) |
Ah yes, I saw the recursive downsampling in there for the VQT. The effective window size would then depend on the resampling method also... Although the above suggestions could be applied to the In any case, I think some lookahead is acceptable. My current use case is not near-real-time stream processing, but chunked processing of very large audio files (e.g., several hours). As such, a lookahead of several seconds is acceptable, and is how I currently handle resampling elsewhere. I'll handle this for now by passing in additional lookback / lookforward samples that are beyond the practical IR length of any IIR / FIR filter and line up with hop boundaries then "cleverly" chop off the extraneous frames. It would be a nice extra to have an interface that helped with this, provided lookback / lookforward lengths, either for the user to manage, or had a mode where it would store overlapping state internally. But I appreciate that is a much bigger change. |
Yeah, it could definitely be implemented in pseudo-cqt, since that's just an stft + basis projection anyway. I agree that some lookahead is fine here, but the issue is going to be maintaining state through the api. When we have things like iir filters that propagate state across blocks (pcen, preemphasis), we manage this by having an additional return variable to initialize the next call. This isn't my favorite api choice, but it is done for consistency with the scipy style (see lfilter). I think really the best way to go about this kind of thing would be to use generators instead of functions, which would allow for internal state to be preserved without expanding the api. I think this kind of thing is doable with the soxr backend, but i haven't looked into it carefully. A generator interface has been in my mind for a while now though, any it might make it into a plan for librosa 2.0, provided it doesn't conflict with our other plans (array api mainly). |
Is your feature request related to a problem? Please describe.
Currently all constant Q operations call the
stft
operation with parametercenter=True
, which forces some sort of padding (e.g., zeros, or a reflection). This prevents the computation of the equivalent of a very large CQT in chunks, as this padding will be inserted at each chunk point. Thiscenter
parameter should be configurable through the user interface so that it is possible to for "gapless" chunking by the user, where the user can provide each segment of audio as a continuation of the former. This would also require a simple interface to get the equivalent maximum CQT window size, so that a user can chunk the audio and align CQT windows appropriately.Describe the solution you'd like
center
parameter to cqt, vqt and pseudo_cqt operations.cqt_window_size(sample_rate, fmin, num_octaves, bins_per_octave, hop_length)
) so that this is easily accessible to the user.Describe alternatives you've considered
Currently to do this, one must modify private functions of the
constantq
module such as__cqt_response
, which is bad practice.Additional context
N/A
The text was updated successfully, but these errors were encountered: