Librosa: Window size and hop length for mfcc

Created on 19 Nov 2018 · 4Comments · Source: librosa/librosa

Hi,

I've been trying to figure out how windowing with mfcc is done. Basically, I want to generate a mfcc vector for 1 second of a soundfile.

So from my understanding, you are able to provide the window size and hop length as parameters to feature.mfcc. However, setting these parameters do not work as expected.

Consider the following example:

y, sr = librosa.load(librosa.util.example_audio_file(), offset=10, duration=1)
mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=10, hop_length=sr, n_fft=sr)

So, by setting the hop_length = n_fft = sr I would expect to have windows of size sr with a hop of sr. From my understanding, a should return exaclty 1 mfcc vector, so that the shape of a is (10,1). However, the above example returns a mfcc with the shape (10,2).

I am quite sure, this has something to do with my hop_enght and n_fft parameters (maybe some kind of offset error). However I am unable to figure it out.

I would appreciate a clarification,
Thanks

question

Source

Depansho

Most helpful comment

However, the above example returns a mfcc with the shape (10,2).

This is because in librosa, all analysis is frame-centered by default. So the first mfcc will be a window centered around sample 0, not starting at sample 0. To make this happen, the underlying STFT will pad the signal by half a frame.

If you want a left-aligned analysis, you can compute the STFT directly with centering disabled, something like:

>>> D = librosa.stft(y, hop_length=sr, n_fft=sr, center=False)
>>> melspec = librosa.feature.melspectrogram(S=np.abs(D)**2)
>>> mfcc = librosa.feature.mfcc(S=librosa.power_to_db(melspec))

bmcfee on 19 Nov 2018

👍2

All 4 comments

However, the above example returns a mfcc with the shape (10,2).

If you want a left-aligned analysis, you can compute the STFT directly with centering disabled, something like:

>>> D = librosa.stft(y, hop_length=sr, n_fft=sr, center=False)
>>> melspec = librosa.feature.melspectrogram(S=np.abs(D)**2)
>>> mfcc = librosa.feature.mfcc(S=librosa.power_to_db(melspec))

bmcfee on 19 Nov 2018

👍2

Hi Bmcfee , I also encountered a similar problem today.
In my case, the numbers of frames after librosa.feature.melspectrogram() and librosa.util.frame are different. The former is exactly two more than the latter. I guess it should be the reason of frame-centered in STFT.
Thank you~

Yaxiong2015 on 20 Nov 2018

👍1

That's right -- frame doesn't change the input data at all (for instance, it doesn't pad). It just provides a reshaped view of the same data. STFT pads first and then frames.

bmcfee on 20 Nov 2018

However, the above example returns a mfcc with the shape (10,2).

This is because in librosa, all analysis is frame-centered by default. So the first mfcc will be a window centered around sample 0, not starting at sample 0. To make this happen, the underlying STFT will pad the signal by half a frame.

If you want a left-aligned analysis, you can compute the STFT directly with centering disabled, something like:
>>> D = librosa.stft(y, hop_length=sr, n_fft=sr, center=False)
>>> melspec = librosa.feature.melspectrogram(S=np.abs(D)**2)
>>> mfcc = librosa.feature.mfcc(S=librosa.power_to_db(melspec))

This solved my problem, thanks for the explanation!

Depansho on 22 Nov 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings