Deepspeech: Silence is inferred as "i"

Created on 5 Dec 2019 · 11Comments · Source: mozilla/DeepSpeech

I am getting i as an output for silence.

OS Platform: Archlinux
Python version: 3.8.0

Followed these steps for deepspeech installation

virtualenv -p python3 $HOME/tmp/deepspeech-venv/
source $HOME/tmp/deepspeech-venv/bin/activate

pip3 install deepspeech

curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.0/deepspeech-0.6.0-models.tar.gz
tar xvf deepspeech-0.6.0-models.tar.gz

Commands to reproduce:

First to create 400ms silence wav file:
sox -n -r 16k -c1 -b 16 400ms-silence.wav trim 0.0 0.4

deepspeech --model deepspeech-0.6.0-models/output_graph.pbmm --lm deepspeech-0.6.0-models/lm.binary --trie deepspeech-0.6.0-models/trie --audio 400ms-silence.wav --beam_width 500 --lm_alpha 0.75 --lm_beta 1.85

Output

Loading model from file deepspeech-0.6.0-models/output_graph.pbmm
TensorFlow: v1.14.0-21-ge77504a
DeepSpeech: v0.6.0-0-g6d43e21
Loaded model in 0.0107s.
Loading language model from files deepspeech-0.6.0-models/lm.binary deepspeech-0.6.0-models/trie
Loaded language model in 0.000166s.
Running inference.
i
Inference took 0.373s for 0.400s audio file.

Source

abdul-rehman0

👍1

Most helpful comment

Fixed, Thanks :). Tested with DeepSpeech Linux AMD64 CPU

abdul-rehman0 on 17 Dec 2019

👍2

All 11 comments

I'm having the same problem when streaming live-audio to the node.js example. I get "i" for silence. Also posted on discourse for reference

JohannesW11K on 16 Dec 2019

I believe it's caused by this check I added with the UTF-8 changes: https://github.com/mozilla/DeepSpeech/blob/551b3dd5f5c36f49af9dc562c69e78d705daee18/native_client/ctcdecode/ctc_beam_search_decoder.cpp#L172-175

I'm on vacation with no access to a laptop for the next two weeks so I can't test it and make a PR until then.

On 16 Dec 2019, at 13:32, JohannesW11K notifications@github.com wrote:

I'm having the same problem when streaming live-audio. I get "i" for silence. Also posted on discourse for reference

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

reuben on 16 Dec 2019

I believe it's caused by this check I added with the UTF-8 changes: https://github.com/mozilla/DeepSpeech/blob/551b3dd5f5c36f49af9dc562c69e78d705daee18/native_client/ctcdecode/ctc_beam_search_decoder.cpp#L172-175 I'm on vacation with no access to a laptop for the next two weeks so I can't test it and make a PR until then.
…
On 16 Dec 2019, at 13:32, JohannesW11K @.*> wrote: I'm having the same problem when streaming live-audio. I get "i" for silence. Also posted on discourse for reference — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Do you remember the exact reason for this check ? I can likely try and fix it, but I've lost context on your UTF-8 work :)

lissyx on 17 Dec 2019

Just remove it and see if it stops giving "i" for silence. It was added because I wanted to penalize empty beams further since it's somewhat common to have single grapheme transcripts in the Mandarin datasets, and those were sometimes being transcribed as empty strings.

On 17 Dec 2019, at 07:39, lissyx notifications@github.com wrote:

I believe it's caused by this check I added with the UTF-8 changes: https://github.com/mozilla/DeepSpeech/blob/551b3dd5f5c36f49af9dc562c69e78d705daee18/native_client/ctcdecode/ctc_beam_search_decoder.cpp#L172-175 I'm on vacation with no access to a laptop for the next two weeks so I can't test it and make a PR until then.
…
On 16 Dec 2019, at 13:32, JohannesW11K @.*> wrote: I'm having the same problem when streaming live-audio. I get "i" for silence. Also posted on discourse for reference — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Do you remember the exact reason for this check ? I can likely try and fix it, but I've lost context on your UTF-8 work :)

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.

reuben on 17 Dec 2019

Just remove it and see if it stops giving "i" for silence. It was added because I wanted to penalize empty beams further since it's somewhat common to have single grapheme transcripts in the Mandarin datasets, and those were sometimes being transcribed as empty strings.
…
On 17 Dec 2019, at 07:39, lissyx @.> wrote: I believe it's caused by this check I added with the UTF-8 changes: https://github.com/mozilla/DeepSpeech/blob/551b3dd5f5c36f49af9dc562c69e78d705daee18/native_client/ctcdecode/ctc_beam_search_decoder.cpp#L172-175 I'm on vacation with no access to a laptop for the next two weeks so I can't test it and make a PR until then. … On 16 Dec 2019, at 13:32, JohannesW11K @.> wrote: I'm having the same problem when streaming live-audio. I get "i" for silence. Also posted on discourse for reference — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Do you remember the exact reason for this check ? I can likely try and fix it, but I've lost context on your UTF-8 work :) — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Done in https://github.com/mozilla/DeepSpeech/pull/2607

@abdul-rehman0 @dsteinman I might need your feedback as well once we have a green PR to test that on your side.

lissyx on 17 Dec 2019

👍1

@lissyx Thanks, will do :-).

abdul-rehman0 on 17 Dec 2019

It's ready: https://community-tc.services.mozilla.com/tasks/groups/XInpWzdaQ4u1pIROntylgA you can pick your own flavor and test :)

lissyx on 17 Dec 2019

Fixed, Thanks :). Tested with DeepSpeech Linux AMD64 CPU

abdul-rehman0 on 17 Dec 2019

👍2

Have there been any reports of a similar problem, where when an empty stream is inferred as "t" ?

It happens quite a lot in some of the code I'm working with, it's this line that causes it:

deepSpeechModel.finishStream(modelStream);

dsteinman on 2 Jan 2020

The actual resulting transcript wasn't specified in the code that caused it, it just penalized empty transcripts to an extent that the model would choose a single letter transcript instead. "i" and "a" happen to be very frequent unigrams in English, which is probably why they are seen a lot, but it doesn't have to be just those two.

reuben on 2 Jan 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.