Keras: K.ctc_batch_cost() get slice index 0 of dimension 0 out of bounds error when using online trainning (batch_size=1)

Created on 20 Jun 2017 · 12Comments · Source: keras-team/keras

Hello, I am using CTC loss function in my model, everything were good until I tried to using online training (batch_size =1). The error was caused by K.ctc_batch_cost function.
The error can be reproduced with the keras example "image_ocr.py" by simply set the "minibatch_size = 1 " in line 446 ( the parameter of TextImagegenerator).

I am using keras 2.0.2 with tensorflow 1.1.0 backend.
Thank you!

To investigate

Source

channingxiao

👍2

Most helpful comment

@fchollet I think this problem can be basically solved by modifying the function ctc_batch_cost() in keras/backend/tensorflow_backend.py . Take a look at the following lines:

label_length = tf.to_int32(tf.squeeze(label_length))
input_length = tf.to_int32(tf.squeeze(input_length))

If the batch_size is 1, these two tensors label_length and input_length will be rank 0; however, they should be rank 1 with shape (1,). At least for tensorflow API ctc_loss(), the parameter sequence_length should be a 1-D tensor, see Tensorflow CTC loss. If the tensor input_length has rank 0, these lines will be broken. As for label_length, it seems that the input should also be at least rank 1 in Tensorflow scan(), so the error occurs in ctc_label_dense_to_sparse().

Hence, my basic solution is to squeeze only along axis 1, that is,

label_length = tf.to_int32(tf.squeeze(label_length, axis=1))
input_length = tf.to_int32(tf.squeeze(input_length, axis=1))

It leads to the rank 1 tensors for all kinds of batch size. I try this solution on my computer, and it works well!

WindQAQ on 21 Nov 2017

👍7 ❤1 🎉1

All 12 comments

Hi ! I am having the same problem with Keras 2.0.6 and TensorFlow 1.2.0.
As I have to do online training, I did get around the problem by making minibatches with twice the same data, but I admit it is a quite dirty solution...

cyprienruffino on 17 Jul 2017

Yes. I am also having the same problem. The problem is definitely related to conversion of dense to sparse.

xisnu on 8 Aug 2017

Glad someone already posted this.
The specific error message is

InvalidArgumentError (see above for traceback): slice index 0 of dimension 0 out of bounds.
[[Node: ctc/scan/strided_slice = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1, _device="/job:localhost/replica:0/task:0/cpu:0"](ctc/scan/Shape, ctc/scan/strided_slice/stack, ctc/scan/strided_slice/stack_1, ctc/scan/strided_slice/stack_2)]]

Here is a somewhat minimal example that shows what's happening. It only occurs for batch_size exactly equal to one.

test_lstm.py.txt

@fchollet It's fairly urgent for me, so if you have any pointers where I could look, I could aid in the investigation. I tried reading the corresponding Tensorflow source code at tensorflow-master\tensorflow\core\util\strided_slice_op.cc, line 299 but as a TF beginner, progress is pretty slow so far.

Cerno-b on 7 Sep 2017

Tried to get into TF debugging and was able to get this stack, if it helps:

Traceback of node construction:
[...]
7: test_lstm\test_lstm.py
Line: 46
Function:
Text: "loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length])"
8: Python35\lib\site-packages\keras\engine\topology.py
Line: 554
Function: __call__
Text: "output = self.call(inputs, *kwargs)"
9: Python35\lib\site-packages\keras\layers\core.py
Line: 659
Function: call
Text: "return self.function(inputs, *arguments)"
10: test_lstm\test_lstm.py
Line: 17
Function: ctc_lambda_func
Text: "return K.ctc_batch_cost(labels, y_pred, input_length, label_length)"
11: Python35\lib\site-packages\keras\backend\tensorflow_backend.py
Line: 3263
Function: ctc_batch_cost
Text: "sparse_labels = tf.to_int32(ctc_label_dense_to_sparse(y_true, label_length))"
12: Python35\lib\site-packages\keras\backend\tensorflow_backend.py
Line: 3222
Function: ctc_label_dense_to_sparse
Text: "initializer=init, parallel_iterations=1)"
13: Python35\lib\site-packages\tensorflow\python\ops\functional_ops.py
Line: 526
Function: scan
Text: "n = array_ops.shape(elems_flat[0])[0]"
14: Python35\lib\site-packages\tensorflow\python\ops\array_ops.py
Line: 509
Function: _SliceHelper
Text: "name=name)"
15: Python35\lib\site-packages\tensorflow\python\ops\array_ops.py
Line: 677
Function: strided_slice
Text: "shrink_axis_mask=shrink_axis_mask)"
16: Python35\lib\site-packages\tensorflow\python\ops\gen_array_ops.py
Line: 3744
Function: strided_slice
Text: "shrink_axis_mask=shrink_axis_mask, name=name)"
17: Python35\lib\site-packages\tensorflow\python\framework\op_def_library.py
Line: 767
Function: apply_op
Text: "op_def=op_def)"
18: Python35\lib\site-packages\tensorflow\python\framework\ops.py
Line: 2630
Function: create_op
Text: "original_op=self._default_original_op, op_def=op_def)"
19: Python35\lib\site-packages\tensorflow\python\framework\ops.py
Line: 1204
Function: __init__
Text: "self._traceback = self._graph._extract_stack() # pylint: disable=protected-access"

Cerno-b on 7 Sep 2017

@fchollet I think this problem can be basically solved by modifying the function ctc_batch_cost() in keras/backend/tensorflow_backend.py . Take a look at the following lines:

label_length = tf.to_int32(tf.squeeze(label_length))
input_length = tf.to_int32(tf.squeeze(input_length))

Hence, my basic solution is to squeeze only along axis 1, that is,

label_length = tf.to_int32(tf.squeeze(label_length, axis=1))
input_length = tf.to_int32(tf.squeeze(input_length, axis=1))

It leads to the rank 1 tensors for all kinds of batch size. I try this solution on my computer, and it works well!

WindQAQ on 21 Nov 2017

👍7 ❤1 🎉1

@WindQAQ @fchollet Could you please tell what input_length and label_length specify? As per the documentation it seems label_length contains the lengths of ground truth strings (in case of OCR). But I'm not sure what input_length means.

saisumanth007 on 28 Jan 2018

You are right about the label_length. Input length is the length of the input sequences. In case of OCR it may represent the sequence of feature vectors created from the input image.

xisnu on 29 Jan 2018

@xisnu Input to the LSTM is (batch_size, 26, 512 ) in my case and the output is (batch_size, 26, 37 ). So what should be the input_length?

saisumanth007 on 29 Jan 2018

Suppose you have three samples like this
Input
_a1 a2 a3
b1b2 b3 b4
c1_
Target
_goat
mat
is_
If you feed this to LSTM CTC model you should pad them to make it equal. So it becomes
Input
_a1 a2 a3 PD
b1 b2 b3 b4
c1 PD PD PD_
So input to LSTM is (3, 4, 1), But you can also input the actual _input sequence lengths_ in an array [3 4 1] and of course _target length_ is another array [4 3 2].

xisnu on 29 Jan 2018

@saisumanth007 It should be the length of inputs before padding, and thus, it can not be determined based on the information you give.

WindQAQ on 29 Jan 2018

👍2

@xisnu @WindQAQ Suppose in OCR, I have 3 images : image1,image2 and image3 with ground truth strings "goat", "mat", "is" respectively.
While training, I will pad the labels to max length i.e., 4 in this case.
So label_length = [4,3,2] --> these are the lengths before padding.
Can we determine input_length in this case?

saisumanth007 on 29 Jan 2018

@saisumanth007

Input to the LSTM is (batch_size, 26, 512 ) in my case

Basically, if you do not pad the inputs, that is, the feature vectors of images, the _input_length_ should be an array filled with 26. It depends on whether you pad the inputs. Maybe you can talk about how you extract the feature vectors from images so that I can help you directly.

WindQAQ on 29 Jan 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings