Deepspeech: Running python DeepSpeech.py on my machine ends prematurely with an error

Created on 4 Mar 2017 · 25Comments · Source: mozilla/DeepSpeech

What should I do? This is the command line log of effects when DeepSpeech.py is run on my machine.

I just installed vanilla TensorFlow with pip install tensorflow and then installed DeepSpeech by following the instructions on the README on GitHub. Did I forget something?

Shyamals-iMac:DeepSpeech shyamalchandra$ python DeepSpeech.py 
Loading the LM will be faster if you build a binary file.
Reading /Users/shyamalchandra/DeepSpeech/data/lm/lm.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Traceback (most recent call last):
  File "DeepSpeech.py", line 23, in <module>
    from util.spell import correction
  File "/Users/shyamalchandra/DeepSpeech/util/spell.py", line 9, in <module>
    MODEL = kenlm.Model('./data/lm/lm.arpa')
  File "kenlm.pyx", line 117, in kenlm.Model.__init__ (python/kenlm.cpp:2242)
IOError: Cannot read model './data/lm/lm.arpa' (lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece &, std::vector<uint64_t> &) threw FormatLoadException. first non-empty line was "version https://git-lfs.github.com/spec/v1" not \data\. Byte: 43)

Source

shyamalschandra

Most helpful comment

I assume you didn't compile deepspeech from source, but did a pip install.

With that assumption note that the argument ordering of deepspeech changed between the 0.1.1 release[1] and master[2], which you are referring to for your command line argument ordering.

As you have 0.1.1 (or 0.1.0) installed you should use the following odering

./deepspeech models/output_graph.pb audio_input.wav models/alphabet.txt models/lm.binary models/trie

kdavis-mozilla on 15 Mar 2018

👍7 🎉5 ❤1

All 25 comments

You need to checkout the DeepSpeech repo after installing git lfs[[1](https://git-lfs.github.com/)]

kdavis-mozilla on 4 Mar 2017

👍3 ❤1

Works now! Thanks, @kdavis-mozilla !

shyamalschandra on 4 Mar 2017

i have installed git lfs and still get the same error

MalikMahnoor on 19 Mar 2017

👍2

@MalikMahnoor Try removing the old repo you checked out before you installed git lfs, then re-checking out the repo

MalikMahnoor$ rm -fr DeepSpeech
MalikMahnoor$ git clone https://github.com/mozilla/DeepSpeech.git

kdavis-mozilla on 19 Mar 2017

Installed git-lfs, re-cloned repo.
No avail, still outputting error.

libc++abi.dylib: terminating with uncaught exception of type lm::FormatLoadException: native_client/kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece &, std::vector<uint64_t> &) threw FormatLoadException.
first non-empty line was "1414678853" not \data\. Byte: 11
Abort trap: 6

odomojuli on 7 Mar 2018

How big is the lm.binary? For example on my system

kdavis@mlc1:~/Code/mozilla/DeepSpeech$  ls -lh data/lm/lm.binary
-rw-rw-r-- 1 kdavis kdavis 1.5G Dec 21 06:09 data/lm/lm.binary

kdavis-mozilla on 7 Mar 2018

@kdavis-mozilla @shyamalschandra Hi I am having almost the same issue. By checkout you mean check out after cloning the whole deepspeech repo on my local machine? What I have notice unusual is that I must run git lfs clone, otherwise normal git clone will just stuck when it starts to download the large binary file. I have tried both git checkout and git lfs checkout after the clone finished. And I can check that the big binary in the data/lm folder is 1.5 G but when I go to one level uper and use ls -lh, it shows the data file is only 400k. In the GUI of ubuntu I can see both the whole repo and the data folder are of the correct size but I just cannot find way to properly check out the large files in order to run the samples.
What can i do to let the sample run successfully? Thank you in advance

zby0902 on 14 Mar 2018

Have you tried one of the release files:
https://github.com/mozilla/DeepSpeech/releases/download/v0.1.1/deepspeech-0.1.1-models.tar.gz

madhavajay on 14 Mar 2018

Hi
Thanks for your help , but I am having issues with is not the models. I am
having problems checkout the DeepSpeech repository

2018-03-15 4:22 GMT+11:00 Madhava Jay notifications@github.com:

Have you tried one of the release files:
https://github.com/mozilla/DeepSpeech/releases/download/
v0.1.1/deepspeech-0.1.1-models.tar.gz

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/mozilla/DeepSpeech/issues/417#issuecomment-373105194,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ATFeuG0IsyuPJS833Lz6G21mI4UzQR-3ks5teVHNgaJpZM4MS_Mk
.

zby0902 on 15 Mar 2018

Here's the full output error:

Loading model from file models/output_graph.pbmm
2018-03-14 19:20:27.795073: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Not found: models/output_graph.pbmm; No such file or directory
Loaded model in 15.491s.
Loading language model from files models/trie my_audio_file.wav
Loading the LM will be faster if you build a binary file.
Reading models/trie
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
libc++abi.dylib: terminating with uncaught exception of type lm::FormatLoadException: native_client/kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece &, std::vector &) threw FormatLoadException.
first non-empty line was "1414678853" not \data. Byte: 11
Abort trap: 6

odomojuli on 15 Mar 2018

@odomojuli How large is lm-binary? See the previous comment of this issue.

kdavis-mozilla on 15 Mar 2018

@kdavis-mozilla Hey can you help me find some clue about my issue? Well my lm-binary is 1.5 Gb when list ls -lhin the lowest directory it is in, but after going up one level it shows the data folder containing the lm-binary is only 400 k. I have tried both git checkout or git lfs checkout but it can help nothing with this situation. What should I do now?
ls -lh DeepSpeech/data/lm/lm.binary
-rwxrwxr-x 1 zby0902 zby0902 1.5G Mar 15 02:38 DeepSpeech/data/lm/lm.binary

But:

ls -lh
total 256K
-rw-rw-r-- 1 zby0902 zby0902 11K Mar 15 02:27 bazel.patch
drwxrwxr-x 2 zby0902 zby0902 4.0K Mar 15 02:27 bin
drwxrwxr-x 5 zby0902 zby0902 4.0K Mar 15 02:27 data

zby0902 on 15 Mar 2018

@zby0902 Generally folders sizes do not include the contained items. As lm.binary is 1.5G that part is fine. The problem must be somewhere else. Could you please describe in detail what you are doing?

kdavis-mozilla on 15 Mar 2018

I am just trying to run the line in the project README.md which is deepspeech models/output_graph.pbmm models/alphabet.txt models/lm.binary models/trie my_audio_file.wav
and it gives the error output similar as the one in the beginning of this post.
(DeepSpeech) zby0902@zby0902-Ubuntu16  ~  deepspeech models/output_graph.pbmm models/alphabet.txt models/lm.binary models/trie my_audio_file.wav Loading model from file models/output_graph.pbmm 2018-03-15 19:10:25.702693: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA Not found: models/output_graph.pbmm; No such file or directory Loaded model in 14.137s. Loading language model from files models/trie my_audio_file.wav Loading the LM will be faster if you build a binary file. Reading models/trie ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 terminate called after throwing an instance of 'lm::FormatLoadException' what(): native_client/kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece&, std::vector&) threw FormatLoadException. first non-empty line was "1414678853" not \data. Byte: 11 [1] 19711 abort (core dumped) deepspeech models/output_graph.pbmm models/alphabet.txt models/lm.binary



Well I have noticed the difference name of the out_graph, the one in my models folder is .pb but the one in the command is .pbmm. What's more I have build tensorflow from source to fit in my cuda9.1,

 2018-03-15 19:10:25.702693: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA


but since it's not an error I think this should not be the cause.

So given my full output what do you thin is the cause of my issue then?


                    
                        
                            
                                
                                zby0902
                                on 15 Mar 2018



                                                
                    
                        I assume you didn't compile deepspeech from source, but did a pip install.

With that assumption note that the argument ordering of deepspeech changed between the 0.1.1 release[1] and master[2], which you are referring to for your command line argument ordering.

As you have 0.1.1 (or 0.1.0) installed you should use the following odering

./deepspeech models/output_graph.pb audio_input.wav models/alphabet.txt models/lm.binary models/trie


                    
                    
                        
                            
                                
                                kdavis-mozilla
                                on 15 Mar 2018
                            
                            
                                                                👍7
🎉5
❤1
                            
                        
                    
                

                                
                    
                    
                
                                                
                    
                        Hi kdavis! It is solved! And yes I just used pip to install the python interface deepspeech rather than build from source, and yes it works in quite short time. But just one question not so important, when I use one wav in teh data file for a test, the output gets error saying only 16000MZ wav is supported , so why there is not only 16000 but 8000 in the data folder as samples?

                    
                    
                        
                            
                                
                                zby0902
                                on 15 Mar 2018
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        @zby0902 Great!

The audio you talk about is in master and for a new feature that allows for other sample rates.

                    
                    
                        
                            
                                
                                kdavis-mozilla
                                on 15 Mar 2018
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        Hmm, what do you mean in master branch? So the feature that supports other

sample rates already exists, the reason why 8000Mz in not support in this

command is that I didn't call the method the correct way so it is bounded

to only the default which is 16000Mz?

2018-03-15 19:46 GMT+11:00 Kelly Davis notifications@github.com:


  @zby0902 https://github.com/zby0902 Great!
  
  The audio you talk about is in master and for a new feature that allows

  for other sample rates.
  
  —

  You are receiving this because you were mentioned.

  Reply to this email directly, view it on GitHub

  https://github.com/mozilla/DeepSpeech/issues/417#issuecomment-373302336,

  or mute the thread

  https://github.com/notifications/unsubscribe-auth/ATFeuIceZeIgs0rzKhGdnVJdmzt26KDPks5teip9gaJpZM4MS_Mk

  .


                    
                    
                        
                            
                                
                                zby0902
                                on 15 Mar 2018
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        It's in master, but not in the command line client you have installed which is slightly older.

                    
                    
                        
                            
                                
                                kdavis-mozilla
                                on 15 Mar 2018
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        We have support in master to convert to mono 16-bits 16kHz when the input is different, but we decided to not accept lower than 16kHz, because our testing would always provide too bad results in this case.

                    
                    
                        
                            
                                
                                lissyx
                                on 15 Mar 2018
                            
                            
                                                                                            
                        
                    
                

                                
                    
                    
                
                                                
                    
                        Hey @kdavis-mozilla  Can you change the command in the README document. I faced similar issue. Took me some time to find this thread, I initially thought there was some problem with my git lfs installation. Thanks!! 

                    
                    
                        
                            
                                
                                Perseus14
                                on 20 Mar 2018
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        @Perseus14 Which command are you talking about ? All the docs should be uptodate regarding order of arguments.

                    
                    
                        
                            
                                
                                lissyx
                                on 20 Mar 2018
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        In the README doc under the section Installing DeepSpeech Python bindings the command

deepspeech models/output_graph.pbmm models/alphabet.txt models/lm.binary models/trie my_audio_file.wav

did not work for me. But the command mentioned by @kdavis-mozilla did 

deepspeech models/output_graph.pb audio_input.wav models/alphabet.txt models/lm.binary models/trie

I just followed the instructions under Using the Python package in the README doc so I am not sure which version I installed but the command to run deepspeech gave me "terminate called after throwing an instance of 'lm::FormatLoadException'". So I tried the one mentioned here and it worked!

                    
                    
                        
                            
                                
                                Perseus14
                                on 20 Mar 2018
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        @Perseus14 You are comparing two different versions. The README is properly uptodate. The currently releases v0.1.1 does have the WAV argument at a different place, but the matching README is good as well: https://github.com/mozilla/DeepSpeech/blob/v0.1.1/README.md#installing-deepspeech-python-bindings

                    
                    
                        
                            
                                
                                lissyx
                                on 20 Mar 2018
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

                    
                    
                        
                            
                                
                                lock[bot]
                                on 3 Jan 2019



            
                
                    
                        Was this page helpful?
                                                                                                    
                                                                                                                        
                                                                
                                                                
                                                                
                                                                
                                                                                    
                        0 / 5 - 0 ratings


    

        

        
            
                Related issues
                                                
                    
                        0.3 models link is broken (deepspeech-0.3.0-models.tar.gz is not found)
                    
                
                
                    
                    axxapy
                                         · 
                    3Comments
                                    
                 
                                                
                    
                        Saw a non-null label (index >= num_classes - 1) following a null label
                    
                
                
                    
                    Tangzy7
                                         · 
                    7Comments
                                    
                 
                                                
                    
                        Status of pretrained models
                    
                
                
                    
                    alanbekker
                                         · 
                    3Comments
                                    
                 
                                                
                    
                        Required version of ds-ctcdecoder does not exit
                    
                
                
                    
                    dabadiesimon
                                         · 
                    6Comments
                                    
                 
                                                
                    
                        serving_client missing
                    
                
                
                    
                    striki70
                                         · 
                    3Comments