Deepspeech: Make model convertible by CoreML

Created on 21 Jun 2017  Â·  74Comments  Â·  Source: mozilla/DeepSpeech

It would be wonderful if DeepSpeech models could be converted to CoreML, for offline use in apps. Here is documentation to do just that. https://developer.apple.com/documentation/coreml/converting_trained_models_to_core_ml Thanks!

P4 enhancement

Most helpful comment

Alright, I met with Apple engineers, and showed them what I have. Apparently it is not possible to convert Tensorflow Lite to CoreML, only full Tensorflow .pb files. This article ( https://venturebeat.com/2017/12/05/google-brings-core-ml-support-to-tensorflow-lite/ ) is incorrect. I can see the misunderstanding given the official Google linked in it, but no, only full Tensorflow models are convertible.

So I showed them the stack traces and steps I took to convert, and they said it just looks like a bug in the beta software. So at the engineer's request, I both filed a RADAR with Apple, and opened this issue on the converter repository: https://github.com/tf-coreml/tf-coreml/issues/309

Fingers crossed that once it's out of beta, we're in the clear!

All 74 comments

@MatthewWaller It doesn't appear as if TenforFlow is supported.

Hmmm, so I would probably need to write a custom conversion tool, like it says at the bottom of the page, I guess.

@MatthewWaller I'd guess so. Which seems like a large outlay in time and knowledge.

Just as an update, I'm examining this Keras model converter script, which comes from Apple's own Python CoreML tools. Could be a good precedent for defining the needed layers. You're right though @kdavis-mozilla, looks like a large project.

@MatthewWaller Thanks for the update!

@MatthewWaller Have you been able to make any progress ?

I think it might be possible to convert with a third party tool. I haven’t written the python conversion scripts myself, but this could be useful (https://github.com/Microsoft/MMdnn/blob/master/README.md). But that’s only half the battle. Then I need to find out how to preprocess the audio, so I’m trying to find out how to get MFCC in Swift. One developer used a C library to do this in iOS, so that might be the way to go.

@MatthewWaller I lack context here, but we have MFCC computation in C already, can't you leverage that?

If I want to use any C libraries I have to port them over to objective c or swift to use them in iOS or macOS. And that’s something I haven’t done yet, and I would prefer to do the calculations all in swift, for longevity sake

@MatthewWaller I came accross https://github.com/tf-coreml/tf-coreml while looking at some tensorflow lite stuff, isn't it already addressing what you want to do?

It does! I hadn’t seen that one. Well, hopefully we just need to get the MFCC one way or another. I’ve got a couple of projects in the hopper before I get back o this one, but that’s exciting!

@lissyx I managed to feed mfcc into a core data model, but I'm not sure where to go to implement the link you sent to convert to coreml, specifically, I'm not sure where to find a list of output tensor names present in the TF graph, (the README.md gives an example of output_feature_names = ['softmax:0'])

Any ideas? Would welcome your help as well @kdavis-mozilla !

Hm I remember documenting that to someone else needing to access some intermediate tensor, on discourse. You should have a look there, I cannot search for it for the moment, I'll try and find it tomorrow if you don't find :-)

@MatthewWaller Any news on that ? The upcoming #1463 might benefit from such support

@lissyx unfortunately I haven't been able to convert to CoreML. The https://github.com/tf-coreml/tf-coreml, which Apple also recommends officially, cannot handle cycles. I tried and got the error, and as a limitation is states: "TF graph must be cycle free (cycles are generally created due to control flow ops like if, while, map, etc.)"

Not sure how to get around this at present. You can see my issue here: https://github.com/tf-coreml/tf-coreml/issues/124

The author states: "I think the simplest way to deal with such graphs for now is to abstract the weight matrices and bias vectors from pre-trained TF. And then use them to build a CoreML model directly using the neural network builder API provided by coremltools." But I'm not sure how to practically go about that.

@MatthewWaller Did you try CoreML on the PR or on master?

I think some, maybe all, cycles should be removed in the PR.

I tried an earlier version on master. Is there a pre-trained model I could use? I see an alpha in the release from 3 days ago. Would that work?

@reuben Can you give @MatthewWaller a preliminary model for the PR to test CoreML?

@MatthewWaller The alpha release is only for the inference binaries, so far, it does not bundle any model change.

@MatthewWaller a preliminary model can be found here: https://github.com/reuben/DeepSpeech/releases/tag/v0.0.1-alpha

@MatthewWaller Were you ever able to find the output_feature_names?

@wshamp I was. I found them to be 'logits:0'. As an update overall, I got the model, but I'm stumped at FailedPreconditionError. Here is the issue I filed with tf-coreml. The full stack trace and my full code for converting is there so far. I haven't heard back yet, but anyone else can troubleshoot as well :)

Hmm my quick google that error seems to indicate an issue with the graph initializing variables not the converter. I hit the same error.

@MatthewWaller The branch stores the decoder state in the graph in the variables previous_state_c and previous_state_h. It's a convenient place to store this state info.

As far as I understand, @reuben correct me if I'm mistaken, in exporting[1] the graph the previous_state_c and previous_state_h should be removed[2] or at least not included.

Maybe the model @reuben provided mistakenly included previous_state_c and previous_state_h?

That blacklist doesn't remove previous_state_{c,h}, but rather makes the freezing process ignore them, since I want them to be variables (not constants) in the final exported graph.

The idea is that before you start feeding audio features and fetching the logits tensor, you have to run the initialize_state op (see the create_inference_graph function in DeepSpeech.py[1]).

In our C++ code we do it inside DS_SetupStream (deepspeech.cc[2]).

[1] https://github.com/mozilla/DeepSpeech/blob/7b873365f8bfffe2ea84dcd34058b537e9095765/DeepSpeech.py#L1718-L1756
[2] https://github.com/mozilla/DeepSpeech/blob/7b873365f8bfffe2ea84dcd34058b537e9095765/native_client/deepspeech.cc#L567

Some other notes: the graph in that URL uses LSTMBlockFusedCell, which is probably not supported by tf-coreml, but the weights are compatible with a normal LSTMCell, so with a bit of massaging on the saver when importing, you can use a static_rnn + LSTMCell.

If you can't workaround the previous_state_{c,h} thing, an alternative is fetching the state and feeding it back every time, eliminating the need for the variable.

static_rnn uses tf.cond OPs when you specify the sequence lengths. If tf.cond OPs are not supported by CoreML, you could try not passing sequence lengths to the RNN. It'll degrade the accuracy, but maybe only by a bit.

Let me know if you run into any other issues.

@MatthewWaller We now have TF Lite support, can it be helpful?

For sure @lissyx ! Here is the official Google page about being able to convert Tensorflow Lite to CoreML.

@MatthewWaller I think you forgot to add the link.

Oops, Yep. Here it is @lissyx and @kdavis-mozilla https://developers.googleblog.com/2017/12/announcing-core-ml-support.html

Have any of you attempted a CoreML conversion yet?

I've not. Maybe @lissyx has?

Let's try?

Well, except I have no iOS device to test that after :)

I can beta test for you :)

On Wed 20. Feb 2019 at 13:53, lissyx notifications@github.com wrote:

Well, except I have no iOS device to test that after :)

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/mozilla/DeepSpeech/issues/642#issuecomment-465561846,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACN_j4eIvXuEFfW4SBqbUTI3_itCV71Bks5vPUVkgaJpZM4OAOny
.

E Unsupported Ops of type: Unpack

:'(

Might be similar requirements there are on the Android NNAPI

So, contrary to Android, we can use StridedSlice, but then it fails:

[...]
131/402: Analysing op name: previous_state_h ( type:  Placeholder )
Skipping name of placeholder
132/402: Analysing op name: previous_state_c ( type:  Placeholder )
Skipping name of placeholder
133/402: Analysing op name: input_node ( type:  Placeholder )
Skipping name of placeholder
134/402: Analysing op name: transpose ( type:  Transpose )
Traceback (most recent call last):
  File "DeepSpeech.py", line 971, in <module>
    tf.app.run(main)
  File "/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/tf-venv/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "DeepSpeech.py", line 964, in main
    export()
  File "DeepSpeech.py", line 855, in export
    'previous_state_h:0': [1,2048],
  File "/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/tf-venv/lib/python3.5/site-packages/tfcoreml/_tf_coreml_converter.py", line 586, in convert
    custom_conversion_functions=custom_conversion_functions)
  File "/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/tf-venv/lib/python3.5/site-packages/tfcoreml/_tf_coreml_converter.py", line 337, in _convert_pb_to_mlmodel
    convert_ops_to_layers(context)
  File "/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/tf-venv/lib/python3.5/site-packages/tfcoreml/_ops_to_layers.py", line 178, in convert_ops_to_layers
    translator(op, context)
  File "/home/alexandre/Documents/codaz/Mozilla/DeepSpeech/tf-venv/lib/python3.5/site-packages/tfcoreml/_layers.py", line 992, in transpose
    assert axes[0] == 0, "only works for 4D tensor without batch axis"
AssertionError: only works for 4D tensor without batch axis

Removing transpose and using input_reshaped:0 as input node yields:

AssertionError: Strided Slice case not handled. Input shape = [16, 1, 2048], output shape = [1, 2048]

Hi @lissyx, I've started using a beta version of the Tensorflow to CoreML Converter that was announced today. Is there a way to get ahold of the TensorFlow Lite version of the .pb file? They have a tone of new layers and such that could help.

Yes, just --export_dir path/to/export --export_tflite

"DeepSpeech" was spotted on one of the slides in the WWDC 2019 - Platforms State of the Union. I
believe there are no blockers anymore.

On Wed, Jun 5, 2019, 8:16 AM lissyx notifications@github.com wrote:

Yes, just --export_dir path/to/export --export_tflite

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/mozilla/DeepSpeech/issues/642?email_source=notifications&email_token=AARX7DZAIFL27IS4LSF7A63PY5K4XA5CNFSM4DQA5HZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW6WVIA#issuecomment-498952864,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AARX7D2MH73A3VFTP673T2LPY5K4XANCNFSM4DQA5HZA
.

@fotiDim Do you have a link or screen shot?

@kdavis-mozilla Yep! Correction, was in the Platforms State of the Union presentation (at 1:21:55).
Screenshot 2019-06-05 at 11 32 17
Screenshot 2019-06-05 at 11 32 21

iOS 13 also does offline speech recognition so perhaps they are using DeepSpeech under the hood now. Otherwise why put it on the screen?

@fotiDim @kdavis-mozilla I tried using their software to convert DeepSpeech here from 0.41 release and it failed (there is a new tfconverter) so maybe Apple ran their own version of Baidu’s architecture. Haven’t tried converting tflite though.

@lissyx I'm getting word that "ds_ctcdecoder-0.4.1-cp27-cp27mu-macosx_10_10_x86_64.whl is not a supported wheel on this platform." when trying to get DeepSpeech running. Any thoughts? Or alternatively, I could accept the already exported TFLite model and try to convert it. Would be great to get DeepSpeech up and running on this laptop though.

@lissyx I'm getting word that "ds_ctcdecoder-0.4.1-cp27-cp27mu-macosx_10_10_x86_64.whl is not a supported wheel on this platform." when trying to get DeepSpeech running. Any thoughts? Or alternatively, I could accept the already exported TFLite model and try to convert it. Would be great to get DeepSpeech up and running on this laptop though.

Can you share more verbose pip install steps? Can you make sure your pip is recent enough ?

@MatthewWaller In case it's a bug in selecting matching package, you can try others from https://tools.taskcluster.net/index/project.deepspeech.deepspeech.native_client.v0.4.1/osx-ctc

@lissyx getting closer. I.used ds_ctcdecoder-0.4.1-cp27-cp27m-macosx_10_10_x86_64.whl and this seems to work.

I'm working with the 0.4.1 release. I downloaded the checkpoint and the source code for that release.

To export, I use ./DeepSpeech.py --checkpoint_dir deepspeech-0.4.1-checkpoint/ --nouse_seq_length --export_tflite --export_dir ./

But this fails at def preprocess(csv_files, batch_size, numcep, numcontext, alphabet, hdf5_cache_path=None): in the preprocess.py because my csv_files are blank. That comes from FLAGS.train_cached_features_path being blank for line 388 of DeepSpeech.py

Pass --notrain --nodev --notest to skip the training/testing phases and just do the export.

Thank you @reuben . I was able to convert to Tensorflow Lite!

So I've got a Jupyter notebook going.

I install dependencies in the terminal first:

# pip install coremltools==3.0b1
# pip install tfcoreml==0.4.0b1
# pip install sympy

And that works well to bring in the new beta versions

Then I import tfcoreml as tf_converter

Then I use this:

tf_converter.convert(tf_model_path = 'path/to/tensorflowlitemodel',
                     mlmodel_path = 'path/to/export/to',
                     output_feature_names = ['logits:0', 'new_state_c:0', 'new_state_h:0'],
                     input_name_shape_dict = { 
                         'input_reshaped:0': [16, 494],
                         'previous_state_c:0': [1,2048],
                         'previous_state_h:0': [1,2048]
                     },
                     use_coreml_3 = True)

This starts working, but then dies, apparently because logits:0 may not be right in the output_feature_names

Here is the full output. Ideas about what I might need to have in output_feature_names?

0 assert nodes deleted
0 nodes deleted
0 nodes deleted
0 nodes deleted
0 disconnected nodes deleted
0 identity nodes deleted
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-5b0c391d0611> in <module>()
      7                          'previous_state_h:0': [1,2048]
      8                      },
----> 9                      use_coreml_3 = True)

/anaconda2/envs/CoreMLConversion/lib/python2.7/site-packages/tfcoreml/_tf_coreml_converter.pyc in convert(tf_model_path, mlmodel_path, output_feature_names, input_name_shape_dict, image_input_names, is_bgr, red_bias, green_bias, blue_bias, gray_bias, image_scale, class_labels, predicted_feature_name, predicted_probabilities_output, add_custom_layers, custom_conversion_functions, use_coreml_3)
    592         tf_model_path,
    593         inputs=input_name_shape_dict,
--> 594         outputs=output_feature_names)
    595     if mlmodel_path is not None:
    596       mlmodel.save(mlmodel_path)

/anaconda2/envs/CoreMLConversion/lib/python2.7/site-packages/coremltools/converters/tensorflow/_tf_converter.pyc in convert(filename, inputs, outputs, **kwargs)
     15     try:
     16         from ..nnssa.coreml.ssa_converter import ssa_convert
---> 17         mlmodelspec = ssa_convert(ssa, top_func='main', inputs=inputs, outputs=outputs)
     18     except ImportError as err:
     19         raise ImportError("Backend converter not found! Error message:\n%s" % err)

/anaconda2/envs/CoreMLConversion/lib/python2.7/site-packages/coremltools/converters/nnssa/coreml/ssa_converter.pyc in ssa_convert(ssa, top_func, inputs, outputs)
     59         graphviz.Source(dot_string).view(filename='/tmp/ssa_after_passes')
     60 
---> 61     converter = SSAConverter(ssa, top_func=top_func, inputs=inputs, outputs=outputs)
     62     converter.convert()
     63     mlmodel_spec = converter.get_spec()

/anaconda2/envs/CoreMLConversion/lib/python2.7/site-packages/coremltools/converters/nnssa/coreml/ssa_converter.pyc in __init__(self, net_ensemble, top_func, inputs, outputs)
    127             for name in outputs:
    128                 if name not in top_output_names:
--> 129                     raise ValueError('Output "%s" is not a nnssa output.' % name)
    130 
    131         top_output_features = list(zip(top_output_names, [None] * len(top_output_names)))

ValueError: Output "logits:0" is not a nnssa output.

Can you see what it thinks are the outputs? I.e. print the contents of top_output_names.

Alright, I met with Apple engineers, and showed them what I have. Apparently it is not possible to convert Tensorflow Lite to CoreML, only full Tensorflow .pb files. This article ( https://venturebeat.com/2017/12/05/google-brings-core-ml-support-to-tensorflow-lite/ ) is incorrect. I can see the misunderstanding given the official Google linked in it, but no, only full Tensorflow models are convertible.

So I showed them the stack traces and steps I took to convert, and they said it just looks like a bug in the beta software. So at the engineer's request, I both filed a RADAR with Apple, and opened this issue on the converter repository: https://github.com/tf-coreml/tf-coreml/issues/309

Fingers crossed that once it's out of beta, we're in the clear!

Nice, thanks for looking into this!

I'm able to get a CoreML model out of the converter using latest master by removing the MFCC feature computation subgraph as well as the model_metadata node. Here's what I did:

$ cd DeepSpeech
$ cat <<EOF | patch -p1 -
diff --git a/DeepSpeech.py b/DeepSpeech.py
index 19e16d3..15b4c10 100755
--- a/DeepSpeech.py
+++ b/DeepSpeech.py
@@ -597,10 +597,10 @@ def create_inference_graph(batch_size=1, n_steps=16, tflite=False):
     batch_size = batch_size if batch_size > 0 else None

     # Create feature computation graph
-    input_samples = tfv1.placeholder(tf.float32, [Config.audio_window_samples], 'input_samples')
-    samples = tf.expand_dims(input_samples, -1)
-    mfccs, _ = samples_to_mfccs(samples, FLAGS.audio_sample_rate)
-    mfccs = tf.identity(mfccs, name='mfccs')
+    # input_samples = tfv1.placeholder(tf.float32, [Config.audio_window_samples], 'input_samples')
+    # samples = tf.expand_dims(input_samples, -1)
+    # mfccs, _ = samples_to_mfccs(samples, FLAGS.audio_sample_rate)
+    # mfccs = tf.identity(mfccs, name='mfccs')

     # Input tensor will be of shape [batch_size, n_steps, 2*n_context+1, n_input]
     # This shape is read by the native_client in DS_CreateModel to know the
@@ -667,7 +667,7 @@ def create_inference_graph(batch_size=1, n_steps=16, tflite=False):
         'input': input_tensor,
         'previous_state_c': previous_state_c,
         'previous_state_h': previous_state_h,
-        'input_samples': input_samples,
+        # 'input_samples': input_samples,
     }

     if not FLAGS.export_tflite:
@@ -677,7 +677,7 @@ def create_inference_graph(batch_size=1, n_steps=16, tflite=False):
         'outputs': logits,
         'new_state_c': new_state_c,
         'new_state_h': new_state_h,
-        'mfccs': mfccs,
+        # 'mfccs': mfccs,
     }

     return inputs, outputs, layers
@@ -699,7 +699,15 @@ def export():
     output_names = ",".join(output_names_tensors + output_names_ops)

     # Create a saver using variables from the above newly created graph
-    saver = tfv1.train.Saver()
+    # Create a saver using variables from the above newly created graph
+    def fixup(name):
+        if name.startswith('cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/'):
+            return name.replace('cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/', 'lstm_fused_cell/')
+        return name
+    mapping = {fixup(v.op.name): v for v in tfv1.global_variables()}
+    import pprint
+    pprint.pprint(mapping)
+    saver = tfv1.train.Saver(mapping)

     # Restore variables from training checkpoint
     checkpoint = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
@@ -741,14 +749,14 @@ def export():
             frozen_graph.version = int(file_relative_read('GRAPH_VERSION').strip())

             # Add a no-op node to the graph with metadata information to be loaded by the native client
-            metadata = frozen_graph.node.add()
-            metadata.name = 'model_metadata'
-            metadata.op = 'NoOp'
-            metadata.attr['sample_rate'].i = FLAGS.audio_sample_rate
-            metadata.attr['feature_win_len'].i = FLAGS.feature_win_len
-            metadata.attr['feature_win_step'].i = FLAGS.feature_win_step
-            if FLAGS.export_language:
-                metadata.attr['language'].s = FLAGS.export_language.encode('ascii')
+            # metadata = frozen_graph.node.add()
+            # metadata.name = 'model_metadata'
+            # metadata.op = 'NoOp'
+            # metadata.attr['sample_rate'].i = FLAGS.audio_sample_rate
+            # metadata.attr['feature_win_len'].i = FLAGS.feature_win_len
+            # metadata.attr['feature_win_step'].i = FLAGS.feature_win_step
+            # if FLAGS.export_language:
+            #     metadata.attr['language'].s = FLAGS.export_language.encode('ascii')

             with open(output_graph_path, 'wb') as fout:
                 fout.write(frozen_graph.SerializeToString())
EOF
$ python DeepSpeech.py --checkpoint_dir ~/Downloads/deepspeech-0.5.1-checkpoint --n_hidden 2048 --export_dir ~/Downloads/v0.5.1-reexport-coreml
$ ipython
Python 3.7.4 (default, Jul  9 2019, 18:13:23)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.6.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import tfcoreml as tf_converter

In [2]: tf_converter.convert(tf_model_path='/Users/reubenmorais/Downloads/v0.5.1-reexport-coreml/output_graph.pb', mlmodel_path='/Users/reubenmorais/Downloads/v0.5.1-reexport-coreml/deepspeech.mlmodel', output_feature_names=['logits', 'new_state_c', 'new_state_h'], input_
   ...: name_shape_dict={ 'input_node': [1,16,19,26], 'input_lengths': [1],  'previous_state_c': [1,2048], 'previous_state_h': [1,2048]}, use_coreml_3=True)
W0723 10:22:30.756310 4457440704 deprecation_wrapper.py:119] From /Users/reubenmorais/.local/share/virtualenvs/DeepSpeech-s4g1Z3_U/lib/python3.7/site-packages/coremltools/converters/nnssa/frontend/tensorflow/graphdef_to_ssa.py:21: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

W0723 10:22:30.756505 4457440704 deprecation_wrapper.py:119] From /Users/reubenmorais/.local/share/virtualenvs/DeepSpeech-s4g1Z3_U/lib/python3.7/site-packages/coremltools/converters/nnssa/frontend/tensorflow/graphdef_to_ssa.py:22: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.

0 assert nodes deleted
W0723 10:22:33.274496 4457440704 deprecation_wrapper.py:119] From /Users/reubenmorais/.local/share/virtualenvs/DeepSpeech-s4g1Z3_U/lib/python3.7/site-packages/coremltools/converters/nnssa/frontend/tensorflow/graph_pass/constant_propagation.py:62: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2019-07-23 10:22:33.274982: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
['cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/SequenceMask/Const:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/range_1/limit:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/read:0', 'layer_1/weights/read:0', 'layer_1/bias:0', 'layer_3/weights:0', 'layer_5/bias:0', 'layer_5/weights:0', 'layer_6/weights/read:0', 'layer_5/bias/read:0', 'layer_6/weights:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/transpose/perm:0', 'layer_3/bias:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/range_1:0', 'Minimum_3/y:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/concat_1/axis:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/zeros:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/range/limit:0', 'Reshape_1/shape:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/Const:0', 'layer_6/bias:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel:0', 'layer_3/bias/read:0', 'layer_1/weights:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/Tile/multiples:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/SequenceMask/Const_2:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias:0', 'raw_logits/shape:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/range/delta:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/ExpandDims_2/dim:0', 'Reshape/shape:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/SequenceMask/Range:0', 'Minimum/y:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/concat/axis:0', 'layer_3/weights/read:0', 'layer_6/bias/read:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/SequenceMask/ExpandDims/dim:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/range:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/range_1/delta:0', 'layer_5/weights/read:0', 'layer_1/bias/read:0', 'transpose/perm:0', 'Minimum_1/y:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/SequenceMask/Const_1:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/zeros/shape_as_tensor:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/ExpandDims_1/dim:0', 'layer_2/bias:0', 'Minimum_2/y:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/range_1/start:0', 'layer_2/bias/read:0', 'layer_2/weights/read:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/read:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/range/start:0', 'Reshape_2/shape:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/zeros/Const:0', 'layer_2/weights:0', 'cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/ExpandDims/dim:0']
23 nodes deleted
0 nodes deleted
0 nodes deleted
[Op Fusion] fuse_bias_add() deleted 10 nodes.
6 identity nodes deleted
5 disconnected nodes deleted
[SSAConverter] Converting function main ...
[SSAConverter] [1/64] Converting op input_node: Placeholder
[SSAConverter] [2/64] Converting op input_lengths: Placeholder
[SSAConverter] [3/64] Converting op previous_state_c: Placeholder
[SSAConverter] [4/64] Converting op previous_state_h: Placeholder
[SSAConverter] [5/64] Converting op transpose/perm: Const
[SSAConverter] [6/64] Converting op Reshape/shape: Const
[SSAConverter] [7/64] Converting op Minimum/y: Const
[SSAConverter] [8/64] Converting op Minimum_1/y: Const
[SSAConverter] [9/64] Converting op Minimum_2/y: Const
[SSAConverter] [10/64] Converting op Reshape_1/shape: Const
[SSAConverter] [11/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/read: Const
[SSAConverter] [12/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/read: Const
[SSAConverter] [13/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/SequenceMask/Range: Const
[SSAConverter] [14/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/SequenceMask/ExpandDims/dim: Const
[SSAConverter] [15/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/transpose/perm: Const
[SSAConverter] [16/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/ExpandDims/dim: Const
[SSAConverter] [17/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/Tile/multiples: Const
[SSAConverter] [18/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/ExpandDims_1/dim: Const
[SSAConverter] [19/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/concat/axis: Const
[SSAConverter] [20/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/ExpandDims_2/dim: Const
[SSAConverter] [21/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/concat_1/axis: Const
[SSAConverter] [22/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/range: Const
[SSAConverter] [23/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/range_1: Const
[SSAConverter] [24/64] Converting op Reshape_2/shape: Const
[SSAConverter] [25/64] Converting op Minimum_3/y: Const
[SSAConverter] [26/64] Converting op raw_logits/shape: Const
[SSAConverter] [27/64] Converting op transpose: Transpose
[SSAConverter] [28/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/SequenceMask/ExpandDims: ExpandDims
[SSAConverter] [29/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/ExpandDims_1: ExpandDims
[SSAConverter] [30/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/ExpandDims_2: ExpandDims
[SSAConverter] [31/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/stack: Pack
[SSAConverter] [32/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/stack_1: Pack
[SSAConverter] [33/64] Converting op Reshape: Reshape
[SSAConverter] [34/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/SequenceMask/Cast: Cast
[SSAConverter] [35/64] Converting op MatMul: MatMul
[SSAConverter] [36/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/SequenceMask/Less: Less
[SSAConverter] [37/64] Converting op Relu: Relu
[SSAConverter] [38/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/SequenceMask/Cast_1: Cast
[SSAConverter] [39/64] Converting op Minimum: Minimum
[SSAConverter] [40/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/transpose: Transpose
[SSAConverter] [41/64] Converting op MatMul_1: MatMul
[SSAConverter] [42/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/ExpandDims: ExpandDims
[SSAConverter] [43/64] Converting op Relu_1: Relu
[SSAConverter] [44/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/Tile: Tile
[SSAConverter] [45/64] Converting op Minimum_1: Minimum
[SSAConverter] [46/64] Converting op MatMul_2: MatMul
[SSAConverter] [47/64] Converting op Relu_2: Relu
[SSAConverter] [48/64] Converting op Minimum_2: Minimum
[SSAConverter] [49/64] Converting op Reshape_1: Reshape
[SSAConverter] [50/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/BlockLSTM/LSTMBlock: LSTMBlock
[SSAConverter] [51/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/BlockLSTM/get_tuple: get_tuple
[SSAConverter] [52/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/BlockLSTM/get_tuple_0: get_tuple
[SSAConverter] [53/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/mul: Mul
[SSAConverter] [54/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/concat: ConcatV2
[SSAConverter] [55/64] Converting op cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/concat_1: ConcatV2
[SSAConverter] [56/64] Converting op Reshape_2: Reshape
[SSAConverter] [57/64] Converting op new_state_c: GatherNd
[SSAConverter] [58/64] Converting op new_state_h: GatherNd
[SSAConverter] [59/64] Converting op MatMul_3: MatMul
[SSAConverter] [60/64] Converting op Relu_3: Relu
[SSAConverter] [61/64] Converting op Minimum_3: Minimum
[SSAConverter] [62/64] Converting op MatMul_4: MatMul
[SSAConverter] [63/64] Converting op raw_logits: Reshape
[SSAConverter] [64/64] Converting op logits: Softmax
[MLModel Pass] 15 disconnected constants are removed from graph.
/Users/reubenmorais/.local/share/virtualenvs/DeepSpeech-s4g1Z3_U/lib/python3.7/site-packages/coremltools/models/model.py:109: RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was: Error compiling model: "Error reading protobuf spec. validator error: The .mlmodel supplied is of version 4, intended for a newer version of Xcode. This version of Xcode supports model version 3 or earlier.".
  RuntimeWarning)
Out[4]:
input {
  name: "input_lengths"
  type {
    multiArrayType {
      shape: 1
      dataType: DOUBLE
    }
  }
}
input {
  name: "input_node"
  type {
    multiArrayType {
      shape: 1
      shape: 16
      shape: 19
      shape: 26
      dataType: DOUBLE
    }
  }
}
input {
  name: "previous_state_c"
  type {
    multiArrayType {
      shape: 1
      shape: 2048
      dataType: DOUBLE
    }
  }
}
input {
  name: "previous_state_h"
  type {
    multiArrayType {
      shape: 1
      shape: 2048
      dataType: DOUBLE
    }
  }
}
output {
  name: "logits"
  type {
    multiArrayType {
      dataType: DOUBLE
    }
  }
}
output {
  name: "new_state_c"
  type {
    multiArrayType {
      dataType: DOUBLE
    }
  }
}
output {
  name: "new_state_h"
  type {
    multiArrayType {
      dataType: DOUBLE
    }
  }
}

That was using coremltools==3.0b3 and tfcoreml==0.4b1

Stupid question, what's the advantage of converting to CoreML?
It seems iOS can run TFLite already?
Wouldn't we just need to run TFLite then hook it up to a compatible ds_ctcdecoder?

Since there's no support to convert TFLite to CoreML, this seems like the only path left? (without using the full .pb model)

I'm able to get a CoreML model out of the converter using latest master by removing the MFCC feature computation subgraph as well as the model_metadata node. Here's what I did:

According to this notebook, it seems that it's no longer necessary to remove the mfcc and metadata node?
https://github.com/apple/coremltools/blob/master/examples/neural_network_inference/tensorflow_converter/Tensorflow_1/deep_speech.ipynb

Even with the CoreML model, there's still work left to do in terms of adding the LM decoder right? What would be the path to get a decoder running on iOS?

Stupid question, what's the advantage of converting to CoreML?
It seems iOS can run TFLite already?

I have no idea of the use of CoreML. I know for sure that getting iOS supported is a bunch of work:

  • cross-compiling libdeepspeech.so
  • setting up iOS CI infra

CoreML makes it so that the device can take advantage of features like Metal for very fast processing. And it works in a more native format. From what I've seen of TFLite, you would have to write interpreter code in C for it to work properly. It's a lot of steps. I'm not sure about how to get the LM decoder working.

This may become moot for for iOS and the Mac because Apple recently added the ability to do on-device speech recognition in several languages, with word-by-word timestamps, which wasn't possible before iOS 13.

hi,
just to share my last exp with coreml, I converted a keras implementation of a CNN RNN with CTC Decoder to coreml last year, using coremltool, it works, but the Coreml LSTM layer as to be run with the CPUOnly =True option (because the GPU implementation is different), so be aware of that, i found that a bit desappointing, it loose many of it interest if it can't run optimized on GPU or NPU...
I don't know if it has been corrected since november 2019, but that was the situation

CoreML makes it so that the device can take advantage of features like Metal

Completely obscure for someone not aware of the macOS/iOS platform details :-).

If someone is able to provide PR / patches that adds that support, it would be very welcome, of course.

Metal is Apple's version of GPU accelerated programming. It's a shame that kezakool found out that the LSTM layer can only be run with CPUOnly = true. Not sure what that's about. Again, the pro of having CoreML support should be like having a Tensorflow Lite package, but accelerated for Apple's hardware. The idea is being able to run on macOS and iOS very fast.

No need to be mocking @lissyx. Your quote left out the comment "for very fast processing," which is basically the upshot of what Metal is for. I don't think we need to go into the specifics of Metal to explain why it would be a nice feature for DeepSpeech to support it.

Personally, I'm fine if this issue is closed. As I mentioned, there are now native implementations of the kind of features that DeepSpeech offers, which there were not when I made the suggestion.

You needn't be mocking @lissyx. Your quote left out the comment "for very fast processing," which is basically the upshot of what Metal is for. I don't think we need to go into the specifics of Metal to explain why it would be a nice feature for DeepSpeech to support it.

There was no mocking in my quotation, sorry if you felt it like that. I was truly asking, because I knew vaguely that Metal was some kind of GPU-related stuff, but that's it, and for me it was tailored at rendering, not the kind of use like OpenCL.

Metal is Apple's version of GPU accelerated programming. It's a shame that kezakool found out that the LSTM layer can only be run with CPUOnly = true. Not sure what that's about. Again, the pro of having CoreML support should be like having a Tensorflow Lite package, but accelerated for Apple's hardware. The idea is being able to run on macOS and iOS very fast.

Thanks for the complement. Sadly, it seems to be the same story as usual LSTM unable to run on accelerated hardware, and thus round-trip to the CPU kills the win. We have mostly the same on Android with NNAPI as much as I could investigate.

What I lack of view is how much CoreML is / could be mandatory in the future. Also, on a technical note, this means more Apple hardware to build and run tests, which is problematic, so that adds up to the uncertainties.

Personally, I'm fine if this issue is closed. As I mentioned, there are now native implementations of the kind of features that DeepSpeech offers, which there were not when I made the suggestion.

I'm unsure, there seems to be activity and it looks like bugs that were hitting us were fixed, so I think it's not a waste to keep it open.

Thanks for your analysis @lissyx. My apologies for taking your comment the wrong way. Hopefully more good comes out of the issue.

More importantly, Apple has deprecated OpenGL and OpenCL (in addition to macOS no longer supporting NVIDIA graphics chips).

So it seems that in future, Metal will be the ONLY way to do anything with the GPU on a Mac.

Thanks for your analysis @lissyx. My apologies for taking your comment the wrong way. Hopefully more good comes out of the issue.

That's also why I emphasize we would welcome PR, it's just that there are multiple colliding things. Nothing unfixable, but we have to pick our battles. Now, we had no Windows support, and @carlfm01 contributed and we ensured it was properly tested, so the call to contribution is not a blind one.

FWIW TensorFlow Lite has an experimental CoreML delegate which would be way easier to integrate into our existing TFLite native client than writing a new CoreML implementation from scratch. Could be worth exploring if anyone is interested: https://www.tensorflow.org/lite/performance/coreml_delegate

@reuben I don't believe that there is a way to write a CoreML implementation from scratch. A CoreML model is meant to be the product of conversion of a model from another framework.

Also Core ML Tools which is the conversion tool from Apple expects a TensorFlow model and not TFLite as input. Then you can convert a TensorFlow to CoreML directly.

The above is at least one path. The other path would be to convert the TensorFlow model to TFLite and then use the Firebase library (I believe MLKit is what is needed) to deploy it on device to do the inference.

@fotiDim I have no idea what you're talking about. If you're replying to my latest message, I understand even less. The model is already converted, see above. I'm talking about the surrounding code that needs to use the CoreML API to perform inference, as well as computing features, managing inputs, outputs, LSTM hidden state, etc. Using the TFLite CoreML delegate would be a way to circumvent all of this work, by reusing our TFLite code.

@reuben nevermind then. I though you were still fighting with the conversion. Your approach sounds correct.

With the latest changes to enable using TF 2.2 on the native client side, I was able to build the TFLite version of libdeepspeech.so for iOS. There's still some work to be done to integrate it with our CI before we can land it, and we would probably need to write a basic Objective-C/Swift wrapper for the API. I don't know anything about iOS development though, so if anyone is interested in helping out, it'd be greatly appreciated. I can help set up the development environment to build for iOS (or just send a copy of libdeepspeech.so) as well as help with CI changes needed for landing iOS support.

@reuben I have lots of Mac/iOS experience with Objective-C. I’m a bit tied up with other projects right now but ping me if no-one else takes this.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

RaphaelHuirong picture RaphaelHuirong  Â·  7Comments

deepak02 picture deepak02  Â·  7Comments

Wissben picture Wissben  Â·  6Comments

axxapy picture axxapy  Â·  3Comments

jacobjennings picture jacobjennings  Â·  6Comments