Deepspeech: issues with libdeepspeech and lm.binary

Created on 28 Sep 2017  路  9Comments  路  Source: mozilla/DeepSpeech

hi all,

im trying to run the ./bin/run-ldc93s1.sh script. i have installed all python dependencies (tensorflow, numpy etc), extracted the pre-built deepspeech binary (?) from the 'native_client.tar.xz' file into the repository's 'native client' directory and specified it (see below) but im running into trouble. heres a sample output:

./bin/run-ldc93s1.sh

  • [ ! -f DeepSpeech.py ]
  • [ ! -f data/ldc93s1/ldc93s1.csv ]
  • [ -d ]
  • python -c from xdg import BaseDirectory as xdg; print(xdg.save_data_path("deepspeech/ldc93s1"))
  • checkpoint_dir=/home/sebastian/.local/share/deepspeech/ldc93s1
  • python -u DeepSpeech.py --train_files data/ldc93s1/ldc93s1.csv --dev_files data/ldc93s1/ldc93s1.csv --test_files data/ldc93s1/ldc93s1.csv --train_batch_size 1 --dev_batch_size 1 --test_batch_size 1 --n_hidden 494 --epoch 5 --checkpoint_dir ./checkpoints --decoder_library_path ./native_client/libctc_decoder_with_kenlm.so

W Parameter --validation_step needs to be >0 for early stopping to work

WARNING: libdeepspeech failed to load, resorting to deprecated code

Refer to README.md for instructions on installing libdeepspeech

I STARTING Optimization
I Training of Epoch 0 - loss: 332.397491
I Training of Epoch 1 - loss: 278.272827
I Training of Epoch 2 - loss: 185.577194
I Training of Epoch 3 - loss: 177.880112
I Training of Epoch 4 - loss: 207.362778
I FINISHED Optimization - training time: 0:00:09
Loading the LM will be faster if you build a binary file.
Reading data/lm/lm.binary
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
terminate called after throwing an instance of 'lm::FormatLoadException'
what(): native_client/kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece&, std::vector&) threw FormatLoadException.
first non-empty line was "version https://git-lfs.github.com/spec/v1" not \data. Byte: 43
Aborted (core dumped)

any ideas what is wrong here?

thanks,

seb

Most helpful comment

Update: Its working, case closed! :)

All 9 comments

Hi Seb. You don t have a lm.binary file !
Your one is just a text file

Check deepspeech issues to find how to create a lm.binary file with kenlm tools

740

No, you don't need to create your own lm.binary, the problem is you haven't installed Git LFS properly. Make sure Git LFS is installed properly before you clone the repository.

Oups !
Well, my language is French, so I need complete setup...
SebastianScherer88, it should be easier for U.

@reuben: just to clarify:
decoder_library_path - leads to the extracted content of the native_client tar
lm_binary_path - leads to the kenlm language model that you created and is part of repo, to be downloaded while cloning using git lfs
lm_trie_path - leads to another (?) language model that you created and is part of repo, to be downloaded while cloning using git lfs

  • --decoder_library_path should point to the libctc_decoder_with_kenlm.so file that is in the native_client archive.

  • --lm_binary_path should point to data/lm/lm.binary (that's the default value, so you don't need to change it)

  • --lm_trie_path should point to data/lm/trie (that's the default value, so you don't need to change it)

ok, thanks a lot, i'll just try recloning with git lfs properly and check back if things go wrong. feel free to close, thanks again!

Update: Its working, case closed! :)

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

istojan picture istojan  路  54Comments

mdasari823 picture mdasari823  路  39Comments

shyamalschandra picture shyamalschandra  路  25Comments

khu834 picture khu834  路  48Comments

breandan picture breandan  路  41Comments