Deepspeech: Error using new model from training: Create kernel failed: Invalid argument: NodeDef mentions attr 'identical_element_shapes' not in Op<name=TensorArrayV3;

Created on 27 Jun 2018 · 17Comments · Source: mozilla/DeepSpeech

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.4 LTS
Release: 16.04
Codename: xenial
TensorFlow installed from (our builds, or upstream TensorFlow):
Installed from pip3
Python version: 3.6
Exact command to reproduce:
I followed the instructions to continue contraining from a frozen graph and the model was saved using the --export_dir flag. However, when I try to use the new produced model with the deepspeech command I get an error:

deepspeech new_model/output_graph.pb data/smoke_test/LDC93S1.wav models/alphabet.txt models/lm.binary models/trie
Loading model from file new_model/output_graph.pb
2018-06-26 18:24:28.208963: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.210s.
Loading language model from files models/lm.binary models/trie
Loaded language model in 0.799s.
Running inference.
2018-06-26 18:24:30.766930: E tensorflow/core/framework/op_segment.cc:53] Create kernel failed: Invalid argument: NodeDef mentions attr 'identical_element_shapes' not in Op handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=; attr=dynamic_size:bool,default=false; attr=clear_after_read:bool,default=true; attr=tensor_array_name:string,default=""; is_stateful=true>; NodeDef: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,4096], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0". (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
2018-06-26 18:24:30.766975: E tensorflow/core/common_runtime/executor.cc:643] Executor failed to create kernel. Invalid argument: NodeDef mentions attr 'identical_element_shapes' not in Op handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=; attr=dynamic_size:bool,default=false; attr=clear_after_read:bool,default=true; attr=tensor_array_name:string,default=""; is_stateful=true>; NodeDef: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,4096], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0". (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
[[Node: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,4096], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Error running session: Invalid argument: NodeDef mentions attr 'identical_element_shapes' not in Op handle:resource, flow:float; attr=dtype:type; attr=element_shape:shape,default=; attr=dynamic_size:bool,default=false; attr=clear_after_read:bool,default=true; attr=tensor_array_name:string,default=""; is_stateful=true>; NodeDef: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,4096], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0". (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
[[Node: bidirectional_rnn/bw/bw/TensorArray_1 = TensorArrayV3clear_after_read=true, dtype=DT_FLOAT, dynamic_size=false, element_shape=[?,4096], identical_element_shapes=true, tensor_array_name="bidirectional_rnn/bw/bw/dynamic_rnn/input_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
None
Inference took 1.558s for 2.925s audio file

Command used to produce the above error
deepspeech new_model/output_graph.pb data/smoke_test/LDC93S1.wav models/alphabet.txt models/lm.binary models/trie

I can run the deepspeech command with the prebuilt binaries fine
deepspeech models/output_graph.pb data/smoke_test/LDC93S1.wav models/alphabet.txt models/lm.binary models/trie

The command I used to build the new model was
python3 ./DeepSpeech.py --n_hidden 2048 --initialize_from_frozen_model models/output_graph.pb --checkpoint_dir fine_tuning_checkpoints --epoch 50 --train_files ./data/ep1_tracks/ep1_tracks.csv,./data/ep2_tracks/ep2_tracks.csv,./data/ep3_tracks/ep3_tracks.csv --dev_files ./data/ep4_tracks/ep4_tracks.csv --test_files ./data/ep5_tracks/ep5_tracks.csv --train_batch_size 81 --dev_batch_size 31 --test_batch_size 7 --export_dir ./new_model --validation_step 1 --learning_rate 0.0001 --decoder_library_path ./libctc_decoder_with_kenlm.so
I Initializing from frozen model: models/output_graph.pb
I STARTING Optimization
I Training of Epoch 32 - loss: 121.636536
I Validation of Epoch 32 - loss: 119.534760
I Training of Epoch 33 - loss: 102.412689
I Validation of Epoch 33 - loss: 110.989410
I Training of Epoch 34 - loss: 82.205559
I Validation of Epoch 34 - loss: 105.821556
I Training of Epoch 35 - loss: 68.980141
I Validation of Epoch 35 - loss: 103.784103
I Training of Epoch 36 - loss: 60.628181
I Validation of Epoch 36 - loss: 101.529945
I Training of Epoch 37 - loss: 52.858521
I Validation of Epoch 37 - loss: 98.544525
I Training of Epoch 38 - loss: 48.636227
I Validation of Epoch 38 - loss: 97.130226
I Training of Epoch 39 - loss: 43.446224
I Validation of Epoch 39 - loss: 95.700096
I Training of Epoch 40 - loss: 39.366005
I Validation of Epoch 40 - loss: 94.808357
I Training of Epoch 41 - loss: 35.519043
I Validation of Epoch 41 - loss: 95.092789
I Training of Epoch 42 - loss: 31.837126
I Validation of Epoch 42 - loss: 94.437584
I Training of Epoch 43 - loss: 29.049000
I Validation of Epoch 43 - loss: 97.608131
I Early stop triggered as (for last 4 steps) validation loss: 97.608131 with standard deviation: 0.268259 and mean: 94.779577
I FINISHED Optimization - training time: 0:32:14
I Test of Epoch 44 - WER: 0.731343, loss: 51.183876037597656, mean edit distance: 0.333430
I --------------------------------------------------------------------------------
I WER: 0.375000, loss: 32.807426, mean edit distance: 0.239130
I - src: "three eight seirra pappa victor three two four"
I - res: "three eight seropofla victor three two "
I --------------------------------------------------------------------------------
I WER: 0.400000, loss: 17.107998, mean edit distance: 0.160000
I - src: "four two niner break line"
I - res: "four two none break in "
I --------------------------------------------------------------------------------
I WER: 0.444444, loss: 44.881229, mean edit distance: 0.265306
I - src: "what appears to be a headquarters break line lima"
I - res: "what appears to be a quarter free lying le "
I --------------------------------------------------------------------------------
I WER: 0.833333, loss: 57.690659, mean edit distance: 0.475000
I - src: "two zero personnel line alpha patrolling"
I - res: "two there versonelllyngaulapotro"
I --------------------------------------------------------------------------------
I WER: 0.857143, loss: 63.238319, mean edit distance: 0.346667
I - src: "brown horse this is white horse standby for salute report two dash one over"
I - res: "oworcethisswhitehorsestandbyffrsoliepor two days on over "
I --------------------------------------------------------------------------------
I WER: 0.857143, loss: 68.180138, mean edit distance: 0.373333
I - src: "brown horse this is white horse standby for salute report two dash one over"
I - res: "oworcethisswitehorsestandbyffrsoliepoor two days on over "
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 74.381355, mean edit distance: 0.474576
I - src: "white horse this is brown horse standby to copy line sierra"
I - res: "i orseisofbrownorestanmonecopylineseer"
I --------------------------------------------------------------------------------
I Exporting the model...
Converted 14 variables to const ops.
I Models exported at ./new_model

Source

thejedi

All 17 comments

This is documented here: https://github.com/mozilla/DeepSpeech/wiki#i-get-an-error-about-e-tensorflowcoreframeworkop_segmentcc53-create-kernel-failed-invalid-argument-nodedef-mentions-attr-identical_element_shapes-when-running-inference

Graph exported from TensorFlow r1.5+ are incompatible with 0.1.1 release

lissyx on 27 Jun 2018

👍1

So besides using pip3 to uninstall tensorflow 1.6 and install 1.4.0 is there anything else? Do I need a different decoder library?

thejedi on 27 Jun 2018

You'll need the decoder matching the tensorflow version, so from v0.1.1, that should be all. You can also take it the other way, stick to r1.6 TensorFlow as you did, and rely on newer packages, since we now publish those as alpha, e.g., https://pypi.org/project/deepspeech/0.2.0a6/

lissyx on 27 Jun 2018

👍1

Yes but I want to continue the training from the pre-built models which are built for the 0.1.1 version of deepspeech:
wget -O - https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-models.tar.gz | tar xvfz -

Are there new pre-trained models that are compatible with v 0.2.0a6?

How do I ensure that the decoder matches v 0.1.1? from the instructions I run:
python3 util/taskcluster.py --target .

thejedi on 27 Jun 2018

No, we have not yet released models trained on r1.6. If you used this taskcluster call, then you got the 0.2.0a6 libctc decoder. Easiest so far is to set env variable and use:
TASKCLUSTER_SCHEME=https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.v0.1.1.%(arch_string)s/artifacts/public/%(artifact_name)s python3 util/taskcluster.py --target .

lissyx on 27 Jun 2018

👍1

Or what if instead of cloning the git repo latest branch, I get the DeepSpeech source tar for 0.1.1 - will I still need to set the environment variable?

https://github.com/mozilla/DeepSpeech/releases/

thejedi on 27 Jun 2018

Yes, that will not help you in this case, the URL is kind of hard-coded. You can also just download directly from https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.v0.1.1.cpu/artifacts/public/native_client.tar.xz

lissyx on 27 Jun 2018

🎉1

Awesome thanks!

thejedi on 27 Jun 2018

So I updated the project, the tensorflow and deepspeech version match the pretrained model version and I got the native client tar file for 0.1.1 but I get an error

 python3 ./DeepSpeech.py --n_hidden 2048 --initialize_from_frozen_model ../models/output_graph.pb --checkpoint_dir fine_tuning_checkpoints --epoch 50  --train_files ./data/training/ep1_tracks/ep1_tracks.csv,./data/training/ep2_tracks/ep2_tracks.csv,./data/training/ep3_tracks/ep3_tracks.csv --dev_files ./data/training/ep4_tracks/ep4_tracks.csv --test_files ./data/training/ep5_tracks/ep5_tracks.csv --train_batch_size 81 --dev_batch_size 33 --test_batch_size 34 --export_dir ./new_model --validation_step 1 --learning_rate 0.0001 --decoder_library_path native_client/libctc_decoder_with_kenlm.so
I Initializing from frozen model: ../models/output_graph.pb
I STARTING Optimization
I Training of Epoch 0 - loss: 122.116417
I Validation of Epoch 0 - loss: 104.637581
I Training of Epoch 1 - loss: 82.550423
I Validation of Epoch 1 - loss: 96.167656
I Training of Epoch 2 - loss: 67.309082
I Validation of Epoch 2 - loss: 93.426674
I Training of Epoch 3 - loss: 56.866405
I Validation of Epoch 3 - loss: 94.765869
I Training of Epoch 4 - loss: 47.820656
I Validation of Epoch 4 - loss: 88.682167
I Training of Epoch 5 - loss: 40.196442
I Validation of Epoch 5 - loss: 90.572487
I Training of Epoch 6 - loss: 34.966114
I Validation of Epoch 6 - loss: 91.899178
I Training of Epoch 7 - loss: 29.972021
I Validation of Epoch 7 - loss: 89.163513
I Training of Epoch 8 - loss: 26.040102
I Validation of Epoch 8 - loss: 91.554955
I Training of Epoch 9 - loss: 22.823648
I Validation of Epoch 9 - loss: 92.861061
I Early stop triggered as (for last 4 steps) validation loss: 92.861061 with standard deviation: 1.216614 and mean: 90.872548
I FINISHED Optimization - training time: 0:37:55
Loading the LM will be faster if you build a binary file.
Reading data/lm/lm.binary
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
**terminate called after throwing an instance of 'lm::FormatLoadException'
  what():  native_client/kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece&, std::vector<long unsigned int>&) threw FormatLoadException.
first non-empty line was "version https://git-lfs.github.com/spec/v1" not \data\. Byte: 43
Aborted (core dumped)**

I did the git lfs install
git-lfs/2.4.2 (GitHub; linux amd64; go 1.8.3)
git version 2.17.1

thejedi on 27 Jun 2018

Obviously, the lm file has not been propery checked out by git lfs. You might need to reclone.

lissyx on 27 Jun 2018

👍1

I am getting a new error after removing DeepSpeech and cloning

I Early stop triggered as (for last 4 steps) validation loss: 94.556023 with standard deviation: 1.178466 and mean: 89.540270
I FINISHED Optimization - training time: 0:29:53
terminate called after throwing an instance of 'lm::FormatLoadException'
  what():  native_client/kenlm/lm/binary_format.cc:131 in void lm::ngram::MatchCheck(lm::ngram::ModelType, unsigned int, const lm::ngram::Parameters&) threw FormatLoadException.
The binary file was built for trie with quantization and array-compressed pointers but the inference code is trying to load probing hash tables
Aborted (core dumped)

The command ran from the DeepSpeech directory

python3 ./DeepSpeech.py --n_hidden 2048 --initialize_from_frozen_model ../models/output_graph.pb --checkpoint_dir fine_tuning_checkpoints --epoch 50  --train_files ./data/training/ep1_tracks/ep1_tracks.csv,./data/training/ep2_tracks/ep2_tracks.csv,./data/training/ep3_tracks/ep3_tracks.csv --dev_files ./data/training/ep4_tracks/ep4_tracks.csv --test_files ./data/training/ep5_tracks/ep5_tracks.csv --train_batch_size 81 --dev_batch_size 33 --test_batch_size 34 --export_dir ./new_model --validation_step 1 --learning_rate 0.0001 --decoder_library_path ./native_client_0.1.1/libctc_decoder_with_kenlm.so

Do I need to clone an different branch than just cloning:
git clone https://github.com/mozilla/DeepSpeech

thejedi on 29 Jun 2018

Looks like you are not using the proper language model / trie file. File format changed recently. What's your files size in data/lm/lm.binary ?

Are you sure you are not running with v0.1.1 binaries somewhere? If you use v0.1.1 binaries / model, please checkout v0.1.1 otherwise you are training in an incompatible way ... And you are using the wrong data/lm then. Which explains your error ...

lissyx on 29 Jun 2018

I removed the local repo again and did a git clone https://github.com/mozilla/DeepSpeech --branch v0.1.1

and now I do not have a lm.binary in that folder

thejedi on 29 Jun 2018

I have no idea if Git-LFS should have done it. Try to force git lfs fetch && git lfs checkout

lissyx on 29 Jun 2018

👍1

I ran those two commands and now there is a lm.binary file in there.
-rw-rw-r-- 1 user user 1601028778 Jun 29 11:58 lm.binary
-rw-rw-r-- 1 user user 43550345 Jun 29 10:28 trie
-rw-rw-r-- 1 user user 12124740 Jun 29 10:27 vocab.txt

I am running the training now to see if I still get the error

thejedi on 29 Jun 2018

The training finished and the model exported. I did get some warnings:
I Exporting the model...
I Restored checkpoint at training epoch 16
WARNING:tensorflow:From ./DeepSpeech.py:1746: generic_signature (from tensorflow.contrib.session_bundle.exporter) is deprecated and will be removed after 2017-06-30.
Instructions for updating:
No longer supported. Switch to SavedModel immediately.
WARNING:tensorflow:From ./DeepSpeech.py:1747: Exporter.init (from tensorflow.contrib.session_bundle.exporter) is deprecated and will be removed after 2017-06-30.
Instructions for updating:
No longer supported. Switch to SavedModel immediately.
WARNING:tensorflow:From ./DeepSpeech.py:1756: Exporter.export (from tensorflow.contrib.session_bundle.exporter) is deprecated and will be removed after 2017-06-30.
Instructions for updating:
No longer supported. Switch to SavedModel immediately.
Converted 14 variables to const ops.
I Models exported at ./new_model