Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between kaldifeat mfcc feature and torchaudio mfcc feature #87

Open
binhtranmcs opened this issue Dec 7, 2023 · 7 comments

Comments

@binhtranmcs
Copy link

binhtranmcs commented Dec 7, 2023

Currently, I am using torchaudio.transforms.MFCC to compute features. Now I need to use C++ API of kaldifeat. But I see that the results of the extracted features are different. Here is a script I used:

import kaldifeat
import torchaudio
import torch

torch.manual_seed(0)
torch.set_printoptions(precision=3, sci_mode=False)

wave = torch.rand(1, 400)

# torchaudio mfcc
transform = torchaudio.transforms.MFCC(
    sample_rate=16000,
    n_mfcc=13,
    melkwargs={"n_fft": 400, "hop_length": 160, "n_mels": 23, "center": False, "window_fn": torch.hann_window},
)
ta_mfcc = transform(wave)[0].transpose(0, 1)

# kaldi compliance mfcc
kaldi_mfcc = torchaudio.compliance.kaldi.mfcc(
    wave * 2**15,
    num_ceps = 13,
    num_mel_bins = 23,
    use_energy = False,
    window_type="hanning")

# kaldifeat mfcc
opts_mfcc = kaldifeat.MfccOptions()
opts_mfcc.use_energy = False
opts_mfcc.num_ceps = 13
opts_mfcc.frame_opts.window_type = "hanning"
opts_mfcc.frame_opts.dither = 0
opts_mfcc.mel_opts.num_bins = 23
mfcc = kaldifeat.Mfcc(opts_mfcc)
kaldifeat_mfcc = mfcc(wave[0] * 2**15)

ft = torch.cat([ta_mfcc, kaldi_mfcc, kaldifeat_mfcc]).transpose(0, 1)

print(ft)

The result is:

tensor([[ 92.246, 115.379, 115.379],
        [-10.815, -34.377, -34.377],
        [  2.703, -11.685, -11.685],
        [  0.333, -15.649, -15.649],
        [  4.773,  -7.279,  -7.280],
        [  1.226, -13.743, -13.743],
        [  2.976, -10.609, -10.609],
        [  6.198,  -2.479,  -2.479],
        [  4.769,  -4.193,  -4.193],
        [  5.665,  -0.910,  -0.910],
        [  5.217,  -0.147,  -0.147],
        [  4.096,  -2.355,  -2.355],
        [  5.315,   1.021,   1.021]])

The result from torchaudio.compliance.kaldi.mfcc is the same as that of kaldifeat, but different from torchaudio.transforms.MFCC.

Is there a way to configure kaldifeat so that the result is the same as that of torchaudio.transforms.MFCC. Thanks in advance.

@csukuangfj
Copy link
Owner

ta_mfcc = transform(wave)[0].transpose(0, 1)

Is there a reason to not use wave * 32768?

@binhtranmcs
Copy link
Author

Is there a reason to not use wave * 32768?

I think torchaudio receives input in the range [-1,1]. But with wave * 32768 the results are still different.

@csukuangfj
Copy link
Owner

kaldi_mfcc = torchaudio.compliance.kaldi.mfcc(wave * 2**15, window_type="hanning")

# kaldifeat mfcc
opts_mfcc = kaldifeat.MfccOptions()
opts_mfcc.use_energy = False
opts_mfcc.frame_opts.window_type = "hanning"

Is there a reason to not use the same parameters for torchaudio.transforms.MFCC?
For instance, you use hanning for both kaldifeat and torchaudio.compliance.kaldi.mfcc, but you leave
torchaudio.transforms.MFCC to use its default value, though I am not sure whether its default value is hanning or not.

Also, you are using n_mfcc=13, for torchaudio.transforms.MFCC. Is there any reason to not use the same value for
kaldifeat and torchaudio.compliance.kaldi.mfcc?

If you want to produce the same features for the same input, please ensure

  • you indeed use the same input
  • you indeed use the same arguments

@binhtranmcs
Copy link
Author

For instance, you use hanning for both kaldifeat and torchaudio.compliance.kaldi.mfcc, but you leave torchaudio.transforms.MFCC to use its default value, though I am not sure whether its default value is hanning or not.

Also, you are using n_mfcc=13, for torchaudio.transforms.MFCC. Is there any reason to not use the same value for kaldifeat and torchaudio.compliance.kaldi.mfcc?

@csukuangfj, hanning is the default of torchaudio.transforms.MFCC and num_ceps=13 is the default of kaldifeat.

I just updated the python code as above, adding those arguments. The result is unchanged.

@csukuangfj
Copy link
Owner

csukuangfj commented Dec 7, 2023

Please show your complete code after your changes

@csukuangfj
Copy link
Owner

csukuangfj commented Dec 7, 2023

Also, have you read and checked the following two points?

If you want to produce the same features for the same input, please ensure

  • you indeed use the same input
  • you indeed use the same arguments

@csukuangfj
Copy link
Owner

I strongly suggest that you have a look at
https://pytorch.org/audio/main/_modules/torchaudio/transforms/_transforms.html#MFCC

You need to find all the arguments of MFCC and compare them with kaldifeat and torchaudio.compliance.kaldi.mfcc.

You need to spend time figuring out the reason by yourself.

For instance, you use the default value log_mels=False for MFCC, which is not correct if you want to
get the same features as kaldifeat and torchaudio.compliance.kaldi.mfcc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants