Hi,
I've been trying to figure out how windowing with mfcc is done. Basically, I want to generate a mfcc vector for 1 second of a soundfile.
So from my understanding, you are able to provide the window size and hop length as parameters to feature.mfcc. However, setting these parameters do not work as expected.
Consider the following example:
y, sr = librosa.load(librosa.util.example_audio_file(), offset=10, duration=1)
mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=10, hop_length=sr, n_fft=sr)
So, by setting the hop_length = n_fft = sr I would expect to have windows of size sr with a hop of sr. From my understanding, a should return exaclty 1 mfcc vector, so that the shape of a is (10,1). However, the above example returns a mfcc with the shape (10,2).
I am quite sure, this has something to do with my hop_enght and n_fft parameters (maybe some kind of offset error). However I am unable to figure it out.
I would appreciate a clarification,
Thanks
However, the above example returns a mfcc with the shape (10,2).
This is because in librosa, all analysis is frame-centered by default. So the first mfcc will be a window centered around sample 0, not starting at sample 0. To make this happen, the underlying STFT will pad the signal by half a frame.
If you want a left-aligned analysis, you can compute the STFT directly with centering disabled, something like:
>>> D = librosa.stft(y, hop_length=sr, n_fft=sr, center=False)
>>> melspec = librosa.feature.melspectrogram(S=np.abs(D)**2)
>>> mfcc = librosa.feature.mfcc(S=librosa.power_to_db(melspec))
Hi Bmcfee , I also encountered a similar problem today.
In my case, the numbers of frames after librosa.feature.melspectrogram() and librosa.util.frame are different. The former is exactly two more than the latter. I guess it should be the reason of frame-centered in STFT.
Thank you~
That's right -- frame doesn't change the input data at all (for instance, it doesn't pad). It just provides a reshaped view of the same data. STFT pads first and then frames.
However, the above example returns a mfcc with the shape (10,2).
This is because in librosa, all analysis is frame-centered by default. So the first mfcc will be a window centered around sample 0, not starting at sample 0. To make this happen, the underlying STFT will pad the signal by half a frame.
If you want a left-aligned analysis, you can compute the STFT directly with centering disabled, something like:
>>> D = librosa.stft(y, hop_length=sr, n_fft=sr, center=False) >>> melspec = librosa.feature.melspectrogram(S=np.abs(D)**2) >>> mfcc = librosa.feature.mfcc(S=librosa.power_to_db(melspec))
This solved my problem, thanks for the explanation!
Most helpful comment
This is because in librosa, all analysis is frame-centered by default. So the first mfcc will be a window centered around sample 0, not starting at sample 0. To make this happen, the underlying STFT will pad the signal by half a frame.
If you want a left-aligned analysis, you can compute the STFT directly with centering disabled, something like: