When I attempt to build DS for new language, it needs to train a language model using KenLM. I can run the KenLM toolkit but it always say
in void lm::builder::{anonymous}::StatCollector::CalculateDiscounts(const lm::builder::DiscountConfig&) threw BadDiscountException because `discounts_[i].amount[j] < 0.0 || discounts_[i].amount[j] > j'.
My corpus is around 20,000 utterances (News) and 5-gram. How to build a LM in DS by KenLM (or can use other toolkit?)
I used the steps described here[1] to build an arpa model then a binary model from that.
They suggest doing it all in one step as follows:
bin/lmplz -o 5 <text | bin/build_binary /dev/stdin text.binary
Yes, i use same steps but got the mentioned error.
2017年7月25日 17:10 於 "Kelly Davis" notifications@github.com 寫道:
I used the steps described here[1
https://kheafield.com/code/kenlm/estimation/] to build an arpa model
then a binary model from that.They suggest doing it all in one step as follows:
bin/lmplz -o 5
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/mozilla/DeepSpeech/issues/740#issuecomment-317677745,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHnM7yOa1RD6R-Be2sUK-H9LgTl15wkRks5sRbEQgaJpZM4OiQp1
.
I'm not sure. It worked for me. Could you open an issue on the kenlm repo?
Yes, I have already opened an issue but see no reply.
Did you have success with arpa creation ?
lmplz --text xxx/xxx/sentences.txt --arpa xxx/xxx/sentences.arpa --o 5
and bin creation ?
./build_binary -s xxx/sentences.arpa xxx/lm.binary
Nothing actionable here, and @elpimous' tutorial on https://discourse.mozilla.org/c/deep-speech already covers this pretty well.
I wrote a step-by-step guide for training a KenLM model. You can check it here:
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
I wrote a step-by-step guide for training a KenLM model. You can check it here:
https://github.com/kmario23/KenLM-training