I have some of my sample audios on which i am training the model. I get an error on training dataset:
W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: label SparseTensor is not valid: indices[25] = [0,25] is out of bounds: need 0 <= index < [1,25]
[[Node: tower_0/CTCLoss = CTCLoss[ctc_merge_repeated=true, preprocess_collapse_repeated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](tower_0/Reshape_7/_623, tower_0/ToInt64/_625, tower_0/Gather, tower_0/Gather_DequeueMany:1)]]
Error trace:
Traceback (most recent call last):
File "DeepSpeech.py", line 1654, in <module>
tf.app.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "DeepSpeech.py", line 1617, in main
train()
File "DeepSpeech.py", line 1523, in train
job = COORD.next_job(job)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 478, in __exit__
self._close_internal(exception_type)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 511, in _close_internal
self._sess.close()
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 739, in close
self._sess.close()
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 827, in close
self._coord.join()
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 390, in join
" ".join(stragglers))
RuntimeError: Coordinator stopped with threads still running: Thread-13 Thread-22 Thread-19 Thread-17 Thread-3 Thread-14 Thread-6 Thread-20 Thread-21 Thread-18 Thread-10 Thread-8 Thread-4 Thread-7 Thread-15 Thread-12 Thread-23 Thread-16 Thread-9 Thread-11 Thread-24 Thread-5 Thread-25 Thread-26
But whereas if i run same script on dev dataset It works fine.
Any thoughts what is the cause of this
It's hard for us to debug this is as you are using your own audio + transcript that we don't have access to.
However, I'd guess this is a result of you transcript having characters outside of the supported character set.
Thanks, i have seen that DS throws error if there is transcript but the audio is empty like only a noise in a audio (see attached audio (rename .txt to .wav) ).
We are using an English only as a transcript.
I am getting error for this attached audio (replace ".txt" with ".wav"). Any thoughts and comments will help.
Usually this happens when the number of timesteps in your audio is strictly smaller than the number of characters in the corresponding transcript.
If you have libdeepspeech installed as described in the installation instructions, you can easily use deepspeech.utils.audioToInputVector in a python script of your own to check each of your audio files and get the number of timesteps in it. You then just have to compare it to the number of characters in the corresponding transcript and delete those which do not comply to the rule (n_chars < n_timesteps).
Yes, i was assuming there is similar kind of check in DS.
So do you mean if the audio is stretched like "hhhhheeelllooooooooo" whereas transcript is hello then we can get such errors?
Can i remove this check in DS ?
Another attached audio has a similar case where we are getting this error, where user has stretched the word "Haaagen"
valid.wav.txt
@harrypotter90 what @antho-rousseau means is that if the audio is very short, say 0.1 sec, while the text is very long, say the entire text of _War and Peace_
_"Well, Prince, so Genoa and Lucca are now just family estates of the Buonapartes. But I warn you..."_
then you'll get this type of error.
It's basically due to an incorrect audio + transcript pairing.
ok and that explains why i am getting this error for some of the audio files which are empty.
@antho-rousseau - I will implement the suggestion and remove these audios.
Thanks
@tilmankamp :- No, its a custom audio data set created by us. Thats why we are getting some of these empty/noisy audios, which is very difficult to find actually. Hopefully suggestion provided by @antho-rousseau will help us removing these corrupted audios.
Note that deepspeech.utils.audioToInputVec implements striding as an optimization, but if your dataset has a lot of very short audio files, striding may not be a good idea, since it halves the timesteps in the audio.
On 13 Jul 2017, at 07:46, harrypotter90 notifications@github.com wrote:
@tilmankamp :- No, its a custom audio data set created by us. Thats why we are getting some of these empty/noisy audios, which is very difficult to find actually. Hopefully suggestion provided by @antho-rousseau will help us removing these corrupted audios.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
Note that deepspeech.utils.audioToInputVec implements striding as an optimization, but if your dataset has a lot of very short audio files, striding may not be a good idea, since it halves the timesteps in the audio.