Librosa: Window size and hop length for mfcc

Created on 19 Nov 2018  路  4Comments  路  Source: librosa/librosa

Hi,

I've been trying to figure out how windowing with mfcc is done. Basically, I want to generate a mfcc vector for 1 second of a soundfile.

So from my understanding, you are able to provide the window size and hop length as parameters to feature.mfcc. However, setting these parameters do not work as expected.

Consider the following example:

y, sr = librosa.load(librosa.util.example_audio_file(), offset=10, duration=1)
mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=10, hop_length=sr, n_fft=sr)

So, by setting the hop_length = n_fft = sr I would expect to have windows of size sr with a hop of sr. From my understanding, a should return exaclty 1 mfcc vector, so that the shape of a is (10,1). However, the above example returns a mfcc with the shape (10,2).

I am quite sure, this has something to do with my hop_enght and n_fft parameters (maybe some kind of offset error). However I am unable to figure it out.

I would appreciate a clarification,
Thanks

question

Most helpful comment

However, the above example returns a mfcc with the shape (10,2).

This is because in librosa, all analysis is frame-centered by default. So the first mfcc will be a window centered around sample 0, not starting at sample 0. To make this happen, the underlying STFT will pad the signal by half a frame.

If you want a left-aligned analysis, you can compute the STFT directly with centering disabled, something like:

>>> D = librosa.stft(y, hop_length=sr, n_fft=sr, center=False)
>>> melspec = librosa.feature.melspectrogram(S=np.abs(D)**2)
>>> mfcc = librosa.feature.mfcc(S=librosa.power_to_db(melspec))

All 4 comments

However, the above example returns a mfcc with the shape (10,2).

This is because in librosa, all analysis is frame-centered by default. So the first mfcc will be a window centered around sample 0, not starting at sample 0. To make this happen, the underlying STFT will pad the signal by half a frame.

If you want a left-aligned analysis, you can compute the STFT directly with centering disabled, something like:

>>> D = librosa.stft(y, hop_length=sr, n_fft=sr, center=False)
>>> melspec = librosa.feature.melspectrogram(S=np.abs(D)**2)
>>> mfcc = librosa.feature.mfcc(S=librosa.power_to_db(melspec))

Hi Bmcfee , I also encountered a similar problem today.
In my case, the numbers of frames after librosa.feature.melspectrogram() and librosa.util.frame are different. The former is exactly two more than the latter. I guess it should be the reason of frame-centered in STFT.
Thank you~

That's right -- frame doesn't change the input data at all (for instance, it doesn't pad). It just provides a reshaped view of the same data. STFT pads first and then frames.

However, the above example returns a mfcc with the shape (10,2).

This is because in librosa, all analysis is frame-centered by default. So the first mfcc will be a window centered around sample 0, not starting at sample 0. To make this happen, the underlying STFT will pad the signal by half a frame.

If you want a left-aligned analysis, you can compute the STFT directly with centering disabled, something like:

>>> D = librosa.stft(y, hop_length=sr, n_fft=sr, center=False)
>>> melspec = librosa.feature.melspectrogram(S=np.abs(D)**2)
>>> mfcc = librosa.feature.mfcc(S=librosa.power_to_db(melspec))

This solved my problem, thanks for the explanation!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dim5 picture dim5  路  3Comments

AKBoles picture AKBoles  路  3Comments

ghost picture ghost  路  3Comments

mvollrath picture mvollrath  路  4Comments

stefan-falk picture stefan-falk  路  3Comments