Librosa: stft seems to have very high reconstruction error for hop sizes other than 1/4 of window size.

Created on 2 Feb 2017  路  3Comments  路  Source: librosa/librosa

Here's a bit of code that replicates this:

import librosa
import numpy as np

y, sr = librosa.load(librosa.util.example_audio_file())

n_fft = 4096
hop_length = 1024

n = len(y)
y_pad = librosa.util.fix_length(y, n + hop_length)
D = librosa.stft(y_pad, n_fft = n_fft, hop_length = hop_length)
y_out = librosa.util.fix_length(librosa.istft(D), n)
print np.max(np.abs(y - y_out))

n_fft = 4096
hop_length = 2048

n = len(y)
y_pad = librosa.util.fix_length(y, n + hop_length)
D = librosa.stft(y_pad, n_fft = n_fft, hop_length = hop_length)
y_out = librosa.util.fix_length(librosa.istft(D), n)
print np.max(np.abs(y - y_out))

n_fft = 2048
hop_length = 512

n = len(y)
y_pad = librosa.util.fix_length(y, n + hop_length)
D = librosa.stft(y_pad, n_fft = n_fft, hop_length = hop_length)
y_out = librosa.util.fix_length(librosa.istft(D), n)
print np.max(np.abs(y - y_out))

n_fft = 2048
hop_length = 1024

n = len(y)
y_pad = librosa.util.fix_length(y, n + hop_length)
D = librosa.stft(y_pad, n_fft = n_fft, hop_length = hop_length)
y_out = librosa.util.fix_length(librosa.istft(D), n)
print np.max(np.abs(y - y_out))

n_fft = 2048
hop_length = 256

n = len(y)
y_pad = librosa.util.fix_length(y, n + hop_length)
D = librosa.stft(y_pad, n_fft = n_fft, hop_length = hop_length)
y_out = librosa.util.fix_length(librosa.istft(D), n)
print np.max(np.abs(y - y_out))

Here is what happens when I run it:

1.19209e-07
0.767519
1.19209e-07
0.710516
0.734461

What's happening here? An alignment issue or something else?

question wontfix

Most helpful comment

The reason is that there is missing parameter hop_length in istft which you didn't specify. Replacing librosa.istft(D) with librosa.istft(D, hop_length = hop_length) I got:

1.19209e-07
1.49012e-07
1.19209e-07
1.19209e-07
1.49012e-07

All 3 comments

Did you try plotting y - y_out? Maybe the large values come from some
difference in framing, so that there are some samples at the end that don't
get resynthesized. Another way to test this would be to pad a window's
worth of zeros to the end of the input and see if that helps.

DAn.

The reason is that there is missing parameter hop_length in istft which you didn't specify. Replacing librosa.istft(D) with librosa.istft(D, hop_length = hop_length) I got:

1.19209e-07
1.49012e-07
1.19209e-07
1.19209e-07
1.49012e-07

Ah shoot, yes that's it, of course. My bad! Thanks so much!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alexandreCameron picture alexandreCameron  路  3Comments

Yaxiong2015 picture Yaxiong2015  路  3Comments

sleglaive picture sleglaive  路  3Comments

juanbraga picture juanbraga  路  3Comments

ericdrobinson picture ericdrobinson  路  4Comments