Defaults for Multiscale STFT loss #38

turian · 2022-09-12T04:42:14Z

        fft_sizes=[1024, 2048, 512],
        hop_sizes=[120, 240, 50],
        win_lengths=[600, 1200, 240],

These are the defaults provided. What sample rate are they intended for?

(Just curious, how did you choose them? But desired sample rate is more important for me.)

The text was updated successfully, but these errors were encountered:

csteinmetz1 · 2022-09-13T23:24:56Z

This is a good question and likely should be added to the docstring.

These are the values from the paper we based the implementation on https://arxiv.org/abs/1910.11480. Based on the paper they are meant for audio at 24 kHz. I generally do not use these default values in most of my setups which are at a higher sample rate. DDSP opted to use a larger number of window and frame sizes which perhaps mitigates somewhat the variability across sample rates.

turian · 2022-09-13T23:45:37Z

Yeah. I guess I take a more hardcore mindset here and believe that NO defaults should be provided, and the docstring should give a few examples (with associated SRs) and their cites. The way it is now, it's a bit easy to footgun yourself I think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defaults for Multiscale STFT loss #38

Defaults for Multiscale STFT loss #38

turian commented Sep 12, 2022

csteinmetz1 commented Sep 13, 2022

turian commented Sep 13, 2022

Defaults for Multiscale STFT loss #38

Defaults for Multiscale STFT loss #38

Comments

turian commented Sep 12, 2022

csteinmetz1 commented Sep 13, 2022

turian commented Sep 13, 2022