Deepspeech: Wav format

Created on 11 Jun 2018 · 8Comments · Source: mozilla/DeepSpeech

Hey guys,

I am benchmarking the performance of your pre-trained model on my own repertoire of wav files (sample rate: 16000, audio channels: 1, bits per sample: 16) and got this error:

deepspeech output_graph.pbmm 2.wav alphabet.txt lm.binary trie
dyld: warning, LC_RPATH @executable_path in /Library/Python/2.7/site-packages/deepspeech/_model.so being ignored in restricted program because of @executable_path
dyld: warning, LC_RPATH $ORIGIN/../_solib_darwin/_U_S_Stensorflow_Clibtensorflow_Ucc.so___Utensorflow in /Library/Python/2.7/site-packages/deepspeech/lib/libtensorflow_cc.so being ignored in restricted program because it is a relative path
Loading model from file output_graph.pbmm
2018-06-10 12:39:39.450406: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 1.291s.
Loading language model from files lm.binary trie
Loaded language model in 3.243s.
/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/io/wavfile.py:172: WavFileWarning: Chunk (non-data) not understood, skipping it.
WavFileWarning)
Traceback (most recent call last):
File "/usr/local/bin/deepspeech", line 11, in
sys.exit(main())
File "/Library/Python/2.7/site-packages/deepspeech/client.py", line 66, in main
fs, audio = wav.read(args.audio)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/io/wavfile.py", line 173, in read
_skip_unknown_chunk(fid)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/io/wavfile.py", line 90, in _skip_unknown_chunk
size = struct.unpack(fmt, data)[0]
struct.error: unpack requires a string argument of length 4

Could you please also tell me the formula between this three numbers so that I can double check that i am doing the right conversion (I wrote my own script to convert my wav to make sure they abide by the format for deepspeech, but I could be wrong)?

Source

coco90417

All 8 comments

Hi,

These (sample rate: 16000, audio channels: 1, bits per sample: 16) three numbers are correct, I think the file that was passed for inference was not a wav file.

Can you share your conversion script?

solomonope on 11 Jun 2018

❤1

Yes, I second @solomonope, it feels really weird and since you state you did your own conversion, I'd like to verify exactly what you did @coco90417

lissyx on 11 Jun 2018

❤1

Sure! samplingrate = 16000

private void writeWaveHeader(ByteBuffer buffer, float samplingrate, int length) {
/* RIFF Chunk. /
buffer.put("RIFF".getBytes());
buffer.putInt(36 + length + 2);
buffer.put("WAVE".getBytes()); / WAV format. */

    /* Format chunk. */
    buffer.put("fmt ".getBytes()); /* Begin of the format chunk. */
    buffer.putInt(16); /* Length of the Format chunk. */
    buffer.putShort((short)1); /* Format: 1 = Raw PCM (linear quantization). */
    buffer.putShort((short)1); /* Number of channels. */
    buffer.putInt((int)samplingrate); /* Samplingrate. */
    buffer.putInt((int)(samplingrate * 4)); /* Byte rate. */
    buffer.putShort((short)2); /* Size of frame. */
    buffer.putShort((short)AudioFrame.BITS_PER_SAMPLE) /* Bits per sample. */;

    /* Data chunk */
    buffer.put("data".getBytes()); /* Begin of the data chunk. */
    buffer.putInt(length); /* Length of the data chunk. */
}

coco90417 on 12 Jun 2018

@coco90417 TBH, looking at the code, I can't tell if its correct or wrong, I think we should try the following convert the audio using this link https://audio.online-convert.com/convert-to-wav

If the inference was successful, then we are sure its the audio.

Also below is a snippet I have used to convert audio from other format to wav and I used it successfully with the inference engine .

from pydub import AudioSegment
from pathlib import Path

audio_path = Path("./audio")

for directory in audio_path.iterdir():
    prefix = directory.name

    for audio in list(directory.glob("*.webm")):
        try:
            audio_name = "{0}-{1}.wav".format(prefix, audio.name.split('.')[0])
            AudioSegment.from_file(audio, "webm").set_channels(1).export("{0}/{1}".format("./wav", audio_name),
                                                                         format="wav")
        except Exception as es:
            print(es)

requirements.txt
numpy
scipy
pydub

You would need to have ffmpeg https://www.ffmpeg.org/ installed, here is a link for installation guide on mac http://www.renevolution.com/ffmpeg/2013/03/16/how-to-install-ffmpeg-on-mac-os-x.html

solomonope on 12 Jun 2018

❤1

Use pysox and don't reinvent the wheel:
http://pysox.readthedocs.io/en/latest/example.html