Deepspeech: Error : SparseTensor is not valid

Created on 13 Jul 2017 · 9Comments · Source: mozilla/DeepSpeech

I have some of my sample audios on which i am training the model. I get an error on training dataset:

W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: label SparseTensor is not valid: indices[25] = [0,25] is out of bounds: need 0 <= index < [1,25]
     [[Node: tower_0/CTCLoss = CTCLoss[ctc_merge_repeated=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/Reshape_7/_623, tower_0/ToInt64/_625, tower_0/Gather, tower_0/Gather_DequeueMany:1)]]

Error trace:

Traceback (most recent call last):
  File "DeepSpeech.py", line 1654, in <module>
    tf.app.run()
  File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "DeepSpeech.py", line 1617, in main
    train()
  File "DeepSpeech.py", line 1523, in train
    job = COORD.next_job(job)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 478, in __exit__
    self._close_internal(exception_type)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 511, in _close_internal
    self._sess.close()
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 739, in close
    self._sess.close()
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 827, in close
    self._coord.join()
  File "/usr/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 390, in join
    " ".join(stragglers))
RuntimeError: Coordinator stopped with threads still running: Thread-13 Thread-22 Thread-19 Thread-17 Thread-3 Thread-14 Thread-6 Thread-20 Thread-21 Thread-18 Thread-10 Thread-8 Thread-4 Thread-7 Thread-15 Thread-12 Thread-23 Thread-16 Thread-9 Thread-11 Thread-24 Thread-5 Thread-25 Thread-26

But whereas if i run same script on dev dataset It works fine.

Any thoughts what is the cause of this

Source

harrypotter90

Most helpful comment

Note that deepspeech.utils.audioToInputVec implements striding as an optimization, but if your dataset has a lot of very short audio files, striding may not be a good idea, since it halves the timesteps in the audio.

On 13 Jul 2017, at 07:46, harrypotter90 notifications@github.com wrote:

@tilmankamp :- No, its a custom audio data set created by us. Thats why we are getting some of these empty/noisy audios, which is very difficult to find actually. Hopefully suggestion provided by @antho-rousseau will help us removing these corrupted audios.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

reuben on 13 Jul 2017

👍2

All 9 comments

It's hard for us to debug this is as you are using your own audio + transcript that we don't have access to.

However, I'd guess this is a result of you transcript having characters outside of the supported character set.

kdavis-mozilla on 13 Jul 2017

Thanks, i have seen that DS throws error if there is transcript but the audio is empty like only a noise in a audio (see attached audio (rename .txt to .wav) ).

We are using an English only as a transcript.

sample.txt

I am getting error for this attached audio (replace ".txt" with ".wav"). Any thoughts and comments will help.

harrypotter90 on 13 Jul 2017

Usually this happens when the number of timesteps in your audio is strictly smaller than the number of characters in the corresponding transcript.
If you have libdeepspeech installed as described in the installation instructions, you can easily use deepspeech.utils.audioToInputVector in a python script of your own to check each of your audio files and get the number of timesteps in it. You then just have to compare it to the number of characters in the corresponding transcript and delete those which do not comply to the rule (n_chars < n_timesteps).

antho-rousseau on 13 Jul 2017

👍1

Yes, i was assuming there is similar kind of check in DS.

So do you mean if the audio is stretched like "hhhhheeelllooooooooo" whereas transcript is hello then we can get such errors?
Can i remove this check in DS ?

Another attached audio has a similar case where we are getting this error, where user has stretched the word "Haaagen"
valid.wav.txt

harrypotter90 on 13 Jul 2017

@harrypotter90 what @antho-rousseau means is that if the audio is very short, say 0.1 sec, while the text is very long, say the entire text of _War and Peace_

_"Well, Prince, so Genoa and Lucca are now just family estates of the Buonapartes. But I warn you..."_

then you'll get this type of error.

It's basically due to an incorrect audio + transcript pairing.

kdavis-mozilla on 13 Jul 2017

👍1

ok and that explains why i am getting this error for some of the audio files which are empty.

@antho-rousseau - I will implement the suggestion and remove these audios.
Thanks

harrypotter90 on 13 Jul 2017

@tilmankamp :- No, its a custom audio data set created by us. Thats why we are getting some of these empty/noisy audios, which is very difficult to find actually. Hopefully suggestion provided by @antho-rousseau will help us removing these corrupted audios.

harrypotter90 on 13 Jul 2017

On 13 Jul 2017, at 07:46, harrypotter90 notifications@github.com wrote:

@tilmankamp :- No, its a custom audio data set created by us. Thats why we are getting some of these empty/noisy audios, which is very difficult to find actually. Hopefully suggestion provided by @antho-rousseau will help us removing these corrupted audios.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

reuben on 13 Jul 2017

👍2

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.