Difference between kaldifeat mfcc feature and torchaudio mfcc feature #87

binhtranmcs · 2023-12-07T03:33:36Z

Currently, I am using torchaudio.transforms.MFCC to compute features. Now I need to use C++ API of kaldifeat. But I see that the results of the extracted features are different. Here is a script I used:

import kaldifeat
import torchaudio
import torch

torch.manual_seed(0)
torch.set_printoptions(precision=3, sci_mode=False)

wave = torch.rand(1, 400)

# torchaudio mfcc
transform = torchaudio.transforms.MFCC(
    sample_rate=16000,
    n_mfcc=13,
    melkwargs={"n_fft": 400, "hop_length": 160, "n_mels": 23, "center": False, "window_fn": torch.hann_window},
)
ta_mfcc = transform(wave)[0].transpose(0, 1)

# kaldi compliance mfcc
kaldi_mfcc = torchaudio.compliance.kaldi.mfcc(
    wave * 2**15,
    num_ceps = 13,
    num_mel_bins = 23,
    use_energy = False,
    window_type="hanning")

# kaldifeat mfcc
opts_mfcc = kaldifeat.MfccOptions()
opts_mfcc.use_energy = False
opts_mfcc.num_ceps = 13
opts_mfcc.frame_opts.window_type = "hanning"
opts_mfcc.frame_opts.dither = 0
opts_mfcc.mel_opts.num_bins = 23
mfcc = kaldifeat.Mfcc(opts_mfcc)
kaldifeat_mfcc = mfcc(wave[0] * 2**15)

ft = torch.cat([ta_mfcc, kaldi_mfcc, kaldifeat_mfcc]).transpose(0, 1)

print(ft)

The result is:

tensor([[ 92.246, 115.379, 115.379],
        [-10.815, -34.377, -34.377],
        [  2.703, -11.685, -11.685],
        [  0.333, -15.649, -15.649],
        [  4.773,  -7.279,  -7.280],
        [  1.226, -13.743, -13.743],
        [  2.976, -10.609, -10.609],
        [  6.198,  -2.479,  -2.479],
        [  4.769,  -4.193,  -4.193],
        [  5.665,  -0.910,  -0.910],
        [  5.217,  -0.147,  -0.147],
        [  4.096,  -2.355,  -2.355],
        [  5.315,   1.021,   1.021]])

The result from torchaudio.compliance.kaldi.mfcc is the same as that of kaldifeat, but different from torchaudio.transforms.MFCC.

Is there a way to configure kaldifeat so that the result is the same as that of torchaudio.transforms.MFCC. Thanks in advance.

The text was updated successfully, but these errors were encountered:

csukuangfj · 2023-12-07T03:37:26Z

ta_mfcc = transform(wave)[0].transpose(0, 1)

Is there a reason to not use wave * 32768?

binhtranmcs · 2023-12-07T03:42:24Z

Is there a reason to not use wave * 32768?

I think torchaudio receives input in the range [-1,1]. But with wave * 32768 the results are still different.

csukuangfj · 2023-12-07T04:24:10Z

kaldi_mfcc = torchaudio.compliance.kaldi.mfcc(wave * 2**15, window_type="hanning")

# kaldifeat mfcc
opts_mfcc = kaldifeat.MfccOptions()
opts_mfcc.use_energy = False
opts_mfcc.frame_opts.window_type = "hanning"

Is there a reason to not use the same parameters for torchaudio.transforms.MFCC?
For instance, you use hanning for both kaldifeat and torchaudio.compliance.kaldi.mfcc, but you leave
torchaudio.transforms.MFCC to use its default value, though I am not sure whether its default value is hanning or not.

Also, you are using n_mfcc=13, for torchaudio.transforms.MFCC. Is there any reason to not use the same value for
kaldifeat and torchaudio.compliance.kaldi.mfcc?

If you want to produce the same features for the same input, please ensure

you indeed use the same input
you indeed use the same arguments

binhtranmcs · 2023-12-07T04:40:48Z

For instance, you use hanning for both kaldifeat and torchaudio.compliance.kaldi.mfcc, but you leave torchaudio.transforms.MFCC to use its default value, though I am not sure whether its default value is hanning or not.

Also, you are using n_mfcc=13, for torchaudio.transforms.MFCC. Is there any reason to not use the same value for kaldifeat and torchaudio.compliance.kaldi.mfcc?

@csukuangfj, hanning is the default of torchaudio.transforms.MFCC and num_ceps=13 is the default of kaldifeat.

I just updated the python code as above, adding those arguments. The result is unchanged.

csukuangfj · 2023-12-07T05:45:12Z

~~Please show your complete code after your changes~~

csukuangfj · 2023-12-07T05:46:02Z

Also, have you read and checked the following two points?

If you want to produce the same features for the same input, please ensure

you indeed use the same input

you indeed use the same arguments

csukuangfj · 2023-12-07T06:00:07Z

I strongly suggest that you have a look at
https://pytorch.org/audio/main/_modules/torchaudio/transforms/_transforms.html#MFCC

You need to find all the arguments of MFCC and compare them with kaldifeat and torchaudio.compliance.kaldi.mfcc.

You need to spend time figuring out the reason by yourself.

For instance, you use the default value log_mels=False for MFCC, which is not correct if you want to
get the same features as kaldifeat and torchaudio.compliance.kaldi.mfcc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between kaldifeat mfcc feature and torchaudio mfcc feature #87

Difference between kaldifeat mfcc feature and torchaudio mfcc feature #87

binhtranmcs commented Dec 7, 2023 •

edited

csukuangfj commented Dec 7, 2023

binhtranmcs commented Dec 7, 2023

csukuangfj commented Dec 7, 2023

binhtranmcs commented Dec 7, 2023

csukuangfj commented Dec 7, 2023 •

edited

csukuangfj commented Dec 7, 2023 •

edited

csukuangfj commented Dec 7, 2023

Difference between kaldifeat mfcc feature and torchaudio mfcc feature #87

Difference between kaldifeat mfcc feature and torchaudio mfcc feature #87

Comments

binhtranmcs commented Dec 7, 2023 • edited

csukuangfj commented Dec 7, 2023

binhtranmcs commented Dec 7, 2023

csukuangfj commented Dec 7, 2023

binhtranmcs commented Dec 7, 2023

csukuangfj commented Dec 7, 2023 • edited

csukuangfj commented Dec 7, 2023 • edited

csukuangfj commented Dec 7, 2023

binhtranmcs commented Dec 7, 2023 •

edited

csukuangfj commented Dec 7, 2023 •

edited

csukuangfj commented Dec 7, 2023 •

edited