Deepspeech: issues with libdeepspeech and lm.binary

Created on 28 Sep 2017 · 9Comments · Source: mozilla/DeepSpeech

hi all,

im trying to run the ./bin/run-ldc93s1.sh script. i have installed all python dependencies (tensorflow, numpy etc), extracted the pre-built deepspeech binary (?) from the 'native_client.tar.xz' file into the repository's 'native client' directory and specified it (see below) but im running into trouble. heres a sample output:

./bin/run-ldc93s1.sh

[ ! -f DeepSpeech.py ]
[ ! -f data/ldc93s1/ldc93s1.csv ]
[ -d ]
python -c from xdg import BaseDirectory as xdg; print(xdg.save_data_path("deepspeech/ldc93s1"))
checkpoint_dir=/home/sebastian/.local/share/deepspeech/ldc93s1
python -u DeepSpeech.py --train_files data/ldc93s1/ldc93s1.csv --dev_files data/ldc93s1/ldc93s1.csv --test_files data/ldc93s1/ldc93s1.csv --train_batch_size 1 --dev_batch_size 1 --test_batch_size 1 --n_hidden 494 --epoch 5 --checkpoint_dir ./checkpoints --decoder_library_path ./native_client/libctc_decoder_with_kenlm.so

W Parameter --validation_step needs to be >0 for early stopping to work

WARNING: libdeepspeech failed to load, resorting to deprecated code

Refer to README.md for instructions on installing libdeepspeech

I STARTING Optimization
I Training of Epoch 0 - loss: 332.397491
I Training of Epoch 1 - loss: 278.272827
I Training of Epoch 2 - loss: 185.577194
I Training of Epoch 3 - loss: 177.880112
I Training of Epoch 4 - loss: 207.362778
I FINISHED Optimization - training time: 0:00:09
Loading the LM will be faster if you build a binary file.
Reading data/lm/lm.binary
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
terminate called after throwing an instance of 'lm::FormatLoadException'
what(): native_client/kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece&, std::vector&) threw FormatLoadException.
first non-empty line was "version https://git-lfs.github.com/spec/v1" not \data. Byte: 43
Aborted (core dumped)

any ideas what is wrong here?

thanks,

seb

Source

SebastianScherer88

Most helpful comment

Update: Its working, case closed! :)

SebastianScherer88 on 28 Sep 2017

👍2

All 9 comments

Hi Seb. You don t have a lm.binary file !
Your one is just a text file

elpimous on 28 Sep 2017

Check deepspeech issues to find how to create a lm.binary file with kenlm tools

740

elpimous on 28 Sep 2017

👍1

No, you don't need to create your own lm.binary, the problem is you haven't installed Git LFS properly. Make sure Git LFS is installed properly before you clone the repository.

reuben on 28 Sep 2017

Oups !
Well, my language is French, so I need complete setup...
SebastianScherer88, it should be easier for U.

elpimous on 28 Sep 2017

@reuben: just to clarify:
decoder_library_path - leads to the extracted content of the native_client tar
lm_binary_path - leads to the kenlm language model that you created and is part of repo, to be downloaded while cloning using git lfs
lm_trie_path - leads to another (?) language model that you created and is part of repo, to be downloaded while cloning using git lfs

SebastianScherer88 on 28 Sep 2017

--decoder_library_path should point to the libctc_decoder_with_kenlm.so file that is in the native_client archive.
--lm_binary_path should point to data/lm/lm.binary (that's the default value, so you don't need to change it)
--lm_trie_path should point to data/lm/trie (that's the default value, so you don't need to change it)