Data loaded in from wav file differs between two popular audio libs.
from pydub import AudioSegment
import soundfile
import librosa
import numpy as np
audio = AudioSegment.from_file('matt_00007.wav', format='WAV')
audio2 = soundfile.read('matt_00007.wav')[0]
audio3, samplerate = librosa.load('matt_00007.wav', sr=16000)
# sample rate of data loaded in by pydub is 16000hz
print(np.array(audio.get_array_of_samples())[:5])
print((audio2 * samplerate2)[:5])
print((audio3 * samplerate3)[:5])
# Output:
[ 259 264 359 -244 317]
[ 126.46484375 128.90625 175.29296875 -119.140625 154.78515625]
[ 126.46484 128.90625 175.29297 -119.140625 154.78516 ]
Outputs should be the same. I need to guarantee consistency when I generate MFCCs on other platforms and possible in C#. So I need to know which is correct.
Output differs. See above.
Darwin-18.6.0-x86_64-i386-64bit
Python 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:07:29)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.16.4
SciPy 1.2.1
librosa 0.7.0
The main difference here is coming from the choice of dtype. Librosa defaults to float32, soundfile defaults to float64 (hence the slightly higher precision), and it looks like pydub is returning integer-valued samples. I'm not sure I follow why you're multiplying the sample value by the sampling rate though? If this is 16bit audio, then it should be multiplying by 32768. On my machine for your file, this produces:
In [18]: (y * 32768)[:5]
Out[18]: array([ 259., 264., 359., -244., 317.], dtype=float32)
which matches your reported values for pydub.
Librosa and soundfile appear to be in agreement (up to numerical precision), which is all we can guarantee from our side. Librosa does not support integer-valued samples because many of the downstream analyses (STFT etc) would implicitly cast to floating point anyway, so we opted to put that requirement up front in the audio buffer validation check.
To summarize: I think everything here is behaving to spec.
You are totally right. My mistake. Please forgive my naivety as I begin working in with audio in python. Thank you!
no problem, glad we could sort it out quickly!
Most helpful comment
no problem, glad we could sort it out quickly!