Deepspeech: TFLite + Quantization

Created on 23 Jan 2019 · 9Comments · Source: mozilla/DeepSpeech

Source

lissyx

👍1

Most helpful comment

Quantized TFLite model, v0.4.1, testing on LibriSpeech test-clean dataset:

Test - WER: 0.107406, CER: 0.046574, loss: 0.000000

real    10m42,961s
user    313m45,772s
sys     0m40,347s

lissyx on 13 Feb 2019

👍2

All 9 comments

With post_training_quantize=True in ToCo, on Google Pixel 2 device :

walleye:/data/local/tmp $ ./lite_benchmark_model --graph=output_graph_non_quant.tflite --show_flops --input_layer=input_node,previous_state_c,previous_state_h --input_layer_type=float,float,float --input_layer_shape=1,16,19,26:1:1,2048:1,2048>
STARTING!
The number of items in --input_layer_shape (1,16,19,26:1:1,2048:1,2048, with 4 items) must match the number of items in --input_layer (input_node,previous_state_c,previous_state_h, with 3 items). For example --input_layer=input1,input2 --input_layer_shape=1,224,224,4:1,20
Num runs: [50]
Inter-run delay (seconds): [-1]
Num threads: [1]
Benchmark name: []
Output prefix: []
Warmup runs: [1]
Graph: [output_graph_non_quant.tflite]
Input layers: [input_node,previous_state_c,previous_state_h]
Input shapes: [1,16,19,26:1:1,2048:1,2048]
Use nnapi : [0]
Loaded model output_graph_non_quant.tflite
resolved reporter
Initialized session in 38.631ms
Running benchmark for 1 iterations 
count=1 curr=581397

Running benchmark for 50 iterations 
count=50 first=480581 curr=480893 min=470353 max=487364 avg=479385 std=3728

Average inference timings in us: Warmup: 581397, Init: 38631, no stats: 479385
walleye:/data/local/tmp $ ./lite_benchmark_model --graph=output_graph_quant.tflite  --show_flops --input_layer=input_node,previous_state_c,previous_state_h --input_layer_type=float,float,float --input_layer_shape=1,16,19,26:1:1,2048:1,2048 --ou
STARTING!
The number of items in --input_layer_shape (1,16,19,26:1:1,2048:1,2048, with 4 items) must match the number of items in --input_layer (input_node,previous_state_c,previous_state_h, with 3 items). For example --input_layer=input1,input2 --input_layer_shape=1,224,224,4:1,20
Num runs: [50]
Inter-run delay (seconds): [-1]
Num threads: [1]
Benchmark name: []
Output prefix: []
Warmup runs: [1]
Graph: [output_graph_quant.tflite]
Input layers: [input_node,previous_state_c,previous_state_h]
Input shapes: [1,16,19,26:1:1,2048:1,2048]
Use nnapi : [0]
Loaded model output_graph_quant.tflite
resolved reporter
Initialized session in 36.913ms
Running benchmark for 1 iterations 
count=1 curr=288370

Running benchmark for 50 iterations 
count=50 first=121900 curr=123450 min=121673 max=126796 avg=122527 std=1159

Average inference timings in us: Warmup: 288370, Init: 36913, no stats: 122527

lissyx on 23 Jan 2019

Before going further we need to document accuraccy: https://github.com/tensorflow/tensorflow/blob/r1.12/tensorflow/contrib/lite/tools/accuracy/README.md

lissyx on 23 Jan 2019

Accuracy example with native client:

walleye:/data/local/tmp/arm64 $ LD_LIBRARY_PATH=/data/local/tmp/arm64/ ./deepspeech --model /sdcard/deepspeech/output_graph_non_quant.tflite --alphabet /sdcard/deepspeech/alphabet.txt --audio ../test-alex.en.wav -t
TensorFlow: v1.12.0-10-ge232881
DeepSpeech: v0.4.1-18-g5d842c2
audio_format=1
num_channels=1
sample_rate=16000
bits_per_sample=16
res.buffer_size=116000
i headlor hendo helow
cpu_time_overall=7.89830
walleye:/data/local/tmp/arm64 $ LD_LIBRARY_PATH=/data/local/tmp/arm64/ ./deepspeech --model /sdcard/deepspeech/output_graph_quant.tflite --alphabet /sdcard/deepspeech/alphabet.txt --audio ../test-alex.en.wav -t    
TensorFlow: v1.12.0-10-ge232881
DeepSpeech: v0.4.1-18-g5d842c2
audio_format=1
num_channels=1
sample_rate=16000
bits_per_sample=16
res.buffer_size=116000
a hearlor helo helo
cpu_time_overall=3.01929
walleye:/data/local/tmp/arm64 $

lissyx on 23 Jan 2019

Non-quantized VS Quantized model on LePotato:

lepotato@lepotato:~/ds$ ./lite_benchmark_model --graph=output_graph_non_quant.tflite --show_flops --input_layer=input_node,previous_state_c,previous_state_h --input_layer_type=float,float,float --input_layer_shape=1,16,19,26:1:1,2048:1,2048 --output_layer=logits,new_state_c,new_state_h                                                                                                                                                                                                                 
STARTING!
The number of items in --input_layer_shape (1,16,19,26:1:1,2048:1,2048, with 4 items) must match the number of items in --input_layer (input_node,previous_state_c,previous_state_h, with 3 items). For example --input_layer=input1,input2 --input_layer_shape=1,224,224,4:1,20
Num runs: [50]
Inter-run delay (seconds): [-1]
Num threads: [1]
Benchmark name: []
Output prefix: []
Warmup runs: [1]
Graph: [output_graph_non_quant.tflite]
Input layers: [input_node,previous_state_c,previous_state_h]
Input shapes: [1,16,19,26:1:1,2048:1,2048]
Use nnapi : [0]
Loaded model output_graph_non_quant.tflite
resolved reporter
Initialized session in 5.257ms
Running benchmark for 1 iterations
count=1 curr=1887383

Running benchmark for 50 iterations
count=50 first=1791325 curr=1790110 min=1787146 max=1792665 avg=1.78988e+06 std=1173

Average inference timings in us: Warmup: 1.88738e+06, Init: 5257, no stats: 1.78988e+06
lepotato@lepotato:~/ds$ ./lite_benchmark_model --graph=output_graph_quant.tflite --show_flops --input_layer=input_node,previous_state_c,previous_state_h --input_layer_type=float,float,float --input_layer_shape=1,16,19,26:1:1,2048:1,2048 --output_layer=logits,new_state_c,new_state_h 
STARTING!
The number of items in --input_layer_shape (1,16,19,26:1:1,2048:1,2048, with 4 items) must match the number of items in --input_layer (input_node,previous_state_c,previous_state_h, with 3 items). For example --input_layer=input1,input2 --input_layer_shape=1,224,224,4:1,20
Num runs: [50]
Inter-run delay (seconds): [-1]
Num threads: [1]
Benchmark name: []
Output prefix: []
Warmup runs: [1]
Graph: [output_graph_quant.tflite]
Input layers: [input_node,previous_state_c,previous_state_h]
Input shapes: [1,16,19,26:1:1,2048:1,2048]
Use nnapi : [0]
Loaded model output_graph_quant.tflite
resolved reporter
Initialized session in 4.922ms
Running benchmark for 1 iterations 
count=1 curr=711037

Running benchmark for 50 iterations 
count=50 first=650351 curr=650633 min=650351 max=651962 avg=651087 std=386

Average inference timings in us: Warmup: 711037, Init: 4922, no stats: 651087

lissyx on 23 Jan 2019

Non quantized TFLite model, v0.4.1, testing on LibriSpeech test-clean dataset:

Test - WER: 0.104306, CER: 0.044113, loss: 0.000000

real    52m54,435s
user    1610m25,233s
sys     2m26,146s

lissyx on 13 Feb 2019

Quantized TFLite model, v0.4.1, testing on LibriSpeech test-clean dataset:

Test - WER: 0.107406, CER: 0.046574, loss: 0.000000

real    10m42,961s
user    313m45,772s
sys     0m40,347s

lissyx on 13 Feb 2019

👍2

TF model v0.4.1, testing on LibriSpeech test-clean dataset:

Test - WER: 0.084925, CER: 0.035407, loss: 0.000000

real    46m7,826s
user    1282m18,824s
sys     101m11,893s

lissyx on 13 Feb 2019

Given the low WER impact and the high performances increase, let's enable that.

lissyx on 14 Feb 2019

👍1

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

lock[bot] on 16 Mar 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

could not find a version that satisfies the requirement deepspeech

mdasari823 · 39Comments

Feature Request: tensorflow.js support

beriberikix · 36Comments

Adapting engine to any Custom Language

istojan · 54Comments

Error Non-UTF-8 code starting with '\x83' in file deepspeech on line 2 when doing inferences after training a french model

testdeepv · 62Comments

Restoring from checkpoint failed.

MuruganR96 · 28Comments