You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When we implemented multichannel support #1130 , one subtle issue that pops up is the calculation of a reference amplitude in decibel scaling. This is because when reference is given as a function (eg, ref=np.max), it is applied globally to the entire input to compute a scalar value:
# User supplied a function to calculate reference power
ref_value=ref(magnitude)
This makes sense in the single channel case. However, in the multichannel case, it can break assumptions of channel independence. This was first noticed in the mfcc calculation, where deriving mfccs (which depends on db scaling) can produce different results on a multichannel input than if it was applied independently to each channel. It's also popping up in #1766 by way of onset detection (again, depending on db scaling).
Describe the solution you'd like
Instead of aggregating globally, we could aggregate only over the trailing dimensions. Using keepdims=True should result in channel-wise independent processing. This would be a breaking change, but I expect it would be more in line with expected behavior so it could be worth doing.
There is a subtlety here though. It's not always clear how many trailing dimensions should be included. Most of the time it would be two (time and frequency); sometimes it could be more (eg if patch processing spectra) or less (db scaling single-channel measurements). Probably we'd need some API to expose the aggregation dimensions here, and that would require some thought.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
When we implemented multichannel support #1130 , one subtle issue that pops up is the calculation of a reference amplitude in decibel scaling. This is because when reference is given as a function (eg,
ref=np.max
), it is applied globally to the entire input to compute a scalar value:librosa/librosa/core/spectrum.py
Lines 1866 to 1868 in 5ca70f5
This makes sense in the single channel case. However, in the multichannel case, it can break assumptions of channel independence. This was first noticed in the mfcc calculation, where deriving mfccs (which depends on db scaling) can produce different results on a multichannel input than if it was applied independently to each channel. It's also popping up in #1766 by way of onset detection (again, depending on db scaling).
Describe the solution you'd like
Instead of aggregating globally, we could aggregate only over the trailing dimensions. Using
keepdims=True
should result in channel-wise independent processing. This would be a breaking change, but I expect it would be more in line with expected behavior so it could be worth doing.There is a subtlety here though. It's not always clear how many trailing dimensions should be included. Most of the time it would be two (time and frequency); sometimes it could be more (eg if patch processing spectra) or less (db scaling single-channel measurements). Probably we'd need some API to expose the aggregation dimensions here, and that would require some thought.
The text was updated successfully, but these errors were encountered: