Hello, I am using CTC loss function in my model, everything were good until I tried to using online training (batch_size =1). The error was caused by K.ctc_batch_cost function.
The error can be reproduced with the keras example "image_ocr.py" by simply set the "minibatch_size = 1 " in line 446 ( the parameter of TextImagegenerator).
I am using keras 2.0.2 with tensorflow 1.1.0 backend.
Thank you!
Hi ! I am having the same problem with Keras 2.0.6 and TensorFlow 1.2.0.
As I have to do online training, I did get around the problem by making minibatches with twice the same data, but I admit it is a quite dirty solution...
Yes. I am also having the same problem. The problem is definitely related to conversion of dense to sparse.
Glad someone already posted this.
The specific error message is
InvalidArgumentError (see above for traceback): slice index 0 of dimension 0 out of bounds.
[[Node: ctc/scan/strided_slice = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1, _device="/job:localhost/replica:0/task:0/cpu:0"](ctc/scan/Shape, ctc/scan/strided_slice/stack, ctc/scan/strided_slice/stack_1, ctc/scan/strided_slice/stack_2)]]
Here is a somewhat minimal example that shows what's happening. It only occurs for batch_size exactly equal to one.
@fchollet It's fairly urgent for me, so if you have any pointers where I could look, I could aid in the investigation. I tried reading the corresponding Tensorflow source code at tensorflow-master\tensorflow\core\util\strided_slice_op.cc, line 299 but as a TF beginner, progress is pretty slow so far.
Tried to get into TF debugging and was able to get this stack, if it helps:
Traceback of node construction:
[...]
7: test_lstm\test_lstm.py
Line: 46
Function:
Text: "loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length])"
8: Python35\lib\site-packages\keras\engine\topology.py
Line: 554
Function: __call__
Text: "output = self.call(inputs, *kwargs)"
9: Python35\lib\site-packages\keras\layers\core.py
Line: 659
Function: call
Text: "return self.function(inputs, *arguments)"
10: test_lstm\test_lstm.py
Line: 17
Function: ctc_lambda_func
Text: "return K.ctc_batch_cost(labels, y_pred, input_length, label_length)"
11: Python35\lib\site-packages\keras\backend\tensorflow_backend.py
Line: 3263
Function: ctc_batch_cost
Text: "sparse_labels = tf.to_int32(ctc_label_dense_to_sparse(y_true, label_length))"
12: Python35\lib\site-packages\keras\backend\tensorflow_backend.py
Line: 3222
Function: ctc_label_dense_to_sparse
Text: "initializer=init, parallel_iterations=1)"
13: Python35\lib\site-packages\tensorflow\python\ops\functional_ops.py
Line: 526
Function: scan
Text: "n = array_ops.shape(elems_flat[0])[0]"
14: Python35\lib\site-packages\tensorflow\python\ops\array_ops.py
Line: 509
Function: _SliceHelper
Text: "name=name)"
15: Python35\lib\site-packages\tensorflow\python\ops\array_ops.py
Line: 677
Function: strided_slice
Text: "shrink_axis_mask=shrink_axis_mask)"
16: Python35\lib\site-packages\tensorflow\python\ops\gen_array_ops.py
Line: 3744
Function: strided_slice
Text: "shrink_axis_mask=shrink_axis_mask, name=name)"
17: Python35\lib\site-packages\tensorflow\python\framework\op_def_library.py
Line: 767
Function: apply_op
Text: "op_def=op_def)"
18: Python35\lib\site-packages\tensorflow\python\framework\ops.py
Line: 2630
Function: create_op
Text: "original_op=self._default_original_op, op_def=op_def)"
19: Python35\lib\site-packages\tensorflow\python\framework\ops.py
Line: 1204
Function: __init__
Text: "self._traceback = self._graph._extract_stack() # pylint: disable=protected-access"
@fchollet I think this problem can be basically solved by modifying the function ctc_batch_cost() in keras/backend/tensorflow_backend.py . Take a look at the following lines:
label_length = tf.to_int32(tf.squeeze(label_length))
input_length = tf.to_int32(tf.squeeze(input_length))
If the batch_size is 1, these two tensors label_length and input_length will be rank 0; however, they should be rank 1 with shape (1,). At least for tensorflow API ctc_loss(), the parameter sequence_length should be a 1-D tensor, see Tensorflow CTC loss. If the tensor input_length has rank 0, these lines will be broken. As for label_length, it seems that the input should also be at least rank 1 in Tensorflow scan(), so the error occurs in ctc_label_dense_to_sparse().
Hence, my basic solution is to squeeze only along axis 1, that is,
label_length = tf.to_int32(tf.squeeze(label_length, axis=1))
input_length = tf.to_int32(tf.squeeze(input_length, axis=1))
It leads to the rank 1 tensors for all kinds of batch size. I try this solution on my computer, and it works well!
@WindQAQ @fchollet Could you please tell what input_length and label_length specify? As per the documentation it seems label_length contains the lengths of ground truth strings (in case of OCR). But I'm not sure what input_length means.
You are right about the label_length. Input length is the length of the input sequences. In case of OCR it may represent the sequence of feature vectors created from the input image.
@xisnu Input to the LSTM is (batch_size, 26, 512 ) in my case and the output is (batch_size, 26, 37 ). So what should be the input_length?
Suppose you have three samples like this
Input
_a1 a2 a3
b1b2 b3 b4
c1_
Target
_goat
mat
is_
If you feed this to LSTM CTC model you should pad them to make it equal. So it becomes
Input
_a1 a2 a3 PD
b1 b2 b3 b4
c1 PD PD PD_
So input to LSTM is (3, 4, 1), But you can also input the actual _input sequence lengths_ in an array [3 4 1] and of course _target length_ is another array [4 3 2].
@saisumanth007 It should be the length of inputs before padding, and thus, it can not be determined based on the information you give.
@xisnu @WindQAQ Suppose in OCR, I have 3 images : image1,image2 and image3 with ground truth strings "goat", "mat", "is" respectively.
While training, I will pad the labels to max length i.e., 4 in this case.
So label_length = [4,3,2] --> these are the lengths before padding.
Can we determine input_length in this case?
@saisumanth007
Input to the LSTM is (batch_size, 26, 512 ) in my case
Basically, if you do not pad the inputs, that is, the feature vectors of images, the _input_length_ should be an array filled with 26. It depends on whether you pad the inputs. Maybe you can talk about how you extract the feature vectors from images so that I can help you directly.
Most helpful comment
@fchollet I think this problem can be basically solved by modifying the function ctc_batch_cost() in
keras/backend/tensorflow_backend.py. Take a look at the following lines:If the
batch_sizeis 1, these two tensorslabel_lengthandinput_lengthwill be rank 0; however, they should be rank 1 with shape (1,). At least for tensorflow APIctc_loss(), the parametersequence_lengthshould be a 1-D tensor, see Tensorflow CTC loss. If the tensorinput_lengthhas rank 0, these lines will be broken. As forlabel_length, it seems that the input should also be at least rank 1 in Tensorflowscan(), so the error occurs in ctc_label_dense_to_sparse().Hence, my basic solution is to squeeze only along axis 1, that is,
It leads to the rank 1 tensors for all kinds of batch size. I try this solution on my computer, and it works well!