While we plan to target other languages, we haven't made any decision as to which is the next language to target yet. If you've sufficient speech data for Urdu, thousands of hours of speech, we'd be willing to help in modifying our code for Urdu and lending some server resources for training.

kdavis-mozilla on 1 Jul 2017

👍1

Actually we are trying to make changes in spell.py and text.py for urdu language, and also working for language model in urdu.We have a corpus of urdu on which we will be doing our training.Is this the right approach ?

MalikMahnoor on 1 Jul 2017

@MalikMahnoor Sounds about right. (I'd have to see the details to be sure.) How large an Urdu corpus do you have?

kdavis-mozilla on 11 Jul 2017

700 sentences along with their audios ..but we are using this just to make a prototype..we can even collect more dataset..if this corpus shows good results

Sent from my T-Mobile 4G LTE Device

-------- Original message --------

From: Kelly Davis notifications@github.com

Date:07/11/2017 2:10 PM (GMT+05:00)

To: mozilla/DeepSpeech DeepSpeech@noreply.github.com

Cc: MalikMahnoor mahnoormalikcs542014@yahoo.com,Mention mention@noreply.github.com

Subject: Re: [mozilla/DeepSpeech] Use this model for Urdu language (#634)

@MalikMahnoor Sounds about right. (I'd have to see the details to be sure.) How large an Urdu corpus do you have?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

MalikMahnoor on 11 Jul 2017

And by the way your spell.py and text.py is working fine for urdu as well.We have made our language model ,changed the dataset to urdu..The code works fine till the creation of execution context..It gives error on training.The errors to our understanding are because of n_characters (which we have changed too to no of characters in urdu)but there are other errors too.

Sent from my T-Mobile 4G LTE Device

-------- Original message --------

From: Kelly Davis notifications@github.com

Date:07/11/2017 2:10 PM (GMT+05:00)

To: mozilla/DeepSpeech DeepSpeech@noreply.github.com

Cc: MalikMahnoor mahnoormalikcs542014@yahoo.com,Mention mention@noreply.github.com

Subject: Re: [mozilla/DeepSpeech] Use this model for Urdu language (#634)

@MalikMahnoor Sounds about right. (I'd have to see the details to be sure.) How large an Urdu corpus do you have?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

MalikMahnoor on 11 Jul 2017

Could you post the errors you're getting? Maybe we can help.

kdavis-mozilla on 13 Jul 2017

We have managed to fix those errors..now it goes in to training..the code works fine now.. but only for isolated words not sentences .We are trying to fix text.py for that.Hopefully we ll be able to do that within a few days

Sent from Yahoo Mail on Android

On Thu, Jul 13, 2017 at 8:27 PM, Kelly Davisnotifications@github.com wrote:
Could you post the errors you're getting? Maybe we can help.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

MalikMahnoor on 13 Jul 2017

Awesome!

kdavis-mozilla on 13 Jul 2017

Thanks !

Sent from Yahoo Mail on Android

On Thu, Jul 13, 2017 at 8:35 PM, Kelly Davisnotifications@github.com wrote:
Awesome!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

MalikMahnoor on 13 Jul 2017

@MalikMahnoor When you get an Urdu model up and running and want to distribute it to the world, we'd be happy to help host the model for you. Providing, say, S3 storage so others can download the model.

kdavis-mozilla on 25 Jul 2017

Hi @MalikMahnoor I am also working on Urdu Speech Recognition but using a different approach. I have already tried single speaker 700 sentences corpus recorded by Agha Ali. It is not useful corpus and now planning to add data from new sources. We can collaborate. thanks

abbasrazaali on 28 Dec 2017

@abbasrazaali @MalikMahnoor I would suggest you take a look also at Common Voice, they are working on localization and internationalization, that would help you augment the corpus.

lissyx on 1 Feb 2018

@kdavis-mozilla Are there any specific requirements for audio recordings you need? What if, we provide you, thousands hours of recordings of Urdu TV/radio. Please specify, if there are any such requirements. Can you please also explain, what type of code changes are needed for accomplishing Urdu support?

sajjadsaleem on 3 Apr 2018

@sajjadsaleem I don't know if there are hard an fast _requirements_. However, there are some things which we have found to work.

Audio recorded at least at 16Khz 16bit or more and in mono (stereo _may_ work too)
Audio fed to the system at 16Khz 16bit mono
Text "normalized" so that the transcription corresponds to exactly what is in the audio
Audio segmented into segments of about 1-15 seconds before being fed to the system
Audio need _not_ be perfectly without noise
Audio should reflect the environment in which the system is to be used (If used in noisy area, the audio should be noisy too so the system learns to deal with noise.)

As for supporting Urdu you'll need to make changes similar to those required for French support which is described here[1] or German described here[2].

kdavis-mozilla on 3 Apr 2018

hi its me sehar gul
deep speech is new for me i have to train it for urdu language can u help me how to train it for urdu language??

sehargul on 5 Apr 2018

@sehargul A good start is the discourse post[1]; further discussion can be had there.

kdavis-mozilla on 5 Apr 2018

Any updates on the Urdu model?

cmhashim on 21 Oct 2018

Hi. I couldn't find spell.py file in DeepSpeech Master - Version 0.2.0 alpha 0. what could be the substitute of it ?
Thank you!

Hafsa26 on 31 Oct 2018

@kdavis-mozilla, can you please answer my query? What could be substitute of spell.py file in Deepspeech master Version 0.2.0 alpha0.
Thank you!

Hafsa26 on 31 Oct 2018

@Hafsa26 There have been a lot of changes since spell.py was in the repo. Could you say a little more about what you want to do?

kdavis-mozilla on 1 Nov 2018

@Hafsa26 There have been a lot of changes since spell.py was in the repo. Could you say a little more about what you want to do?

I am working on Urdu Language Speech Recognition system using DeepSpeech. As you said above, we need to make changes in text.py and spell.py for it. I found text.py in repo but couldn't find spell.py.
So what could be the solution for it? Secondly, if you have any blog or help for speech recognition system of some other language using Deepspeech. Kindly please share. Thank you!

Hafsa26 on 1 Nov 2018

@Hafsa26 I guess I'm looking more towards: What your goal? spell.py is no longer in the repo, but the functionality it provided is. So, I need to know what functionality you are trying to use so I can point you in the right direction.

kdavis-mozilla on 1 Nov 2018

@MalikMahnoor Dear what is the status of your work on Urdu language model ? can you share ?

waqasr6 on 4 Dec 2018

@kdavis-mozilla I want to create my own language model based on Urdu language. Can you please help me in this matter ? I've collected approximately 9000 audio recorded files in Urdu voice of 100 different sentences. Currently i am training this data with Roman transcription but i want to train it with Urdu transcription.

waqasr6 on 4 Dec 2018

@kdavis-mozilla I want to create my own language model based on Urdu language. Can you please help me in this matter ? I've collected approximately 9000 audio recorded files in Urdu voice of 100 different sentences. Currently i am training this data with Roman transcription but i want to train it with Urdu transcription.

What's wrong in the current documentation ? There should be everything documented for you to achieve that.

lissyx on 4 Dec 2018

@lissyx can you please elaborate which documentation you are talking about? or share that documentation here. As I've never found any for languages other than English

waqasr6 on 4 Dec 2018

What about README.md ? I really don't understand what's blocking you.

lissyx on 4 Dec 2018

@lissyx - the README.md has all the info needed, but I will admit it's hard to pick it out for newcomers... maybe it's time to write a blogpost for "how to train DeepSpeech on a new language"?

JRMeyer on 4 Dec 2018

@lissyx - the README.md has all the info needed, but I will admit it's hard to pick it out for newcomers... maybe it's time to write a blogpost for "how to train DeepSpeech on a new language"?

Maybe, but again, if we don't know the pain points, it's less efficient. If you ask me, it's trivial and all properly documented. Obviously it's not the case, and thus I'm unsure I can produce anything more useful than the existent documentation.

lissyx on 4 Dec 2018

I've been running into all the pain points getting DS to work with all the CV langs, so I definite could write up that post... I'm just concerned about how much time it would take - a week or so I'd guess.

JRMeyer on 4 Dec 2018

When I finish the Windows parts I'll start working on it for Spanish, @JRMeyer I can share with you the "hardest parts" if you want.

carlfm01 on 4 Dec 2018

@lissyx - the README.md has all the info needed, but I will admit it's hard to pick it out for newcomers... maybe it's time to write a blogpost for "how to train DeepSpeech on a new language"?

it would be very helpful indeed.

waqasr6 on 4 Dec 2018

What about README.md ? I really don't understand what's blocking you.

I just need to know does DeepSpeech supports RTL transcription like Arabic and Urdu ?

waqasr6 on 5 Dec 2018

@waqasr6 I know developers outside of Mozilla have used it for Urdu, but we at Mozilla have never used it for such.

kdavis-mozilla on 5 Dec 2018

I just need to know does DeepSpeech supports RTL transcription like Arabic and Urdu ?

What kind of constraints do you have in mind ? We have support for UTF-8 so chars should be handled properly, and then RTL should not be a problem since this is how training will be done

lissyx on 5 Dec 2018

@lissyx Thanks. Many things in my mind are cleared now. I'll try it with Urdu language model now.

waqasr6 on 5 Dec 2018

@lissyx Hi, How to convert output_graph.pb model into .pbmm model ?
I got my Urdu language model with .pb extension. Is there any way to convert into .pbmm ?

Thank you!

Hafsa26 on 19 Feb 2019

@Hafsa26 Have you read README.md ?

lissyx on 19 Feb 2019

I did. to check the model, I need output_graph.pbmm but I got output_graph.pb
Do I need to make some changes to get .pbmm graph rather than .pb graph.

Hafsa26 on 19 Feb 2019

I think what lissyx is referring to is this.

kdavis-mozilla on 19 Feb 2019

👍1

https://index.taskcluster.net/v1/task/project.deepspeech.tensorflow.pip.r1.12.cpu/artifacts/public/convert_graphdef_memmapped_format
https://index.taskcluster.net/v1/task/project.deepspeech.tensorflow.pip.r1.12.osx/artifacts/public/convert_graphdef_memmapped_format

lissyx on 19 Feb 2019

Thank you so much!

Hafsa26 on 20 Feb 2019

Do you mind sharing figures on how well your model performs? You also might want to export it to tflite format for Android support.

lissyx on 20 Feb 2019

@lissyx yes, I would surely share soon. Up till now, I worked on 1 hour of data and the system is working fine. Though, I am getting 100% WER yet but I will tweak the model once I started working on 300 hours data. I initially have to prepare demo of DeepSpeech for Urdu Language.

Hafsa26 on 20 Feb 2019

If there is anything you can share to make it better, I would love to know.

Hafsa26 on 20 Feb 2019

I am not planning to use it on Android yet but I need, I will surely do it. Thank you for helping all the way.

Hafsa26 on 20 Feb 2019

Please avoid images

lissyx on 20 Feb 2019

When I trained model for one hour, loss is gradually decreasing but after 14 epochs, its increasing for some epochs and decreasing for some epochs.
What do you suggest in such scenario?

Hafsa26 on 20 Feb 2019

When I trained model for one hour, loss is gradually decreasing but after 14 epochs, its increasing for some epochs and decreasing for some epochs.
What do you suggest in such scenario?

Not surprising with only one hour, nothing to conclude. You will have to adjust hyper-parameters, eventually, anyway.

lissyx on 20 Feb 2019

I will. I will be using 300 hours of data next then I will be adjusting hyper-parameters accordingly.
Is there any guide for adjusting hyper-parameters?

Hafsa26 on 20 Feb 2019

I will. I will be using 300 hours of data next then I will be adjusting hyper-parameters accordingly.
Is there any guide for adjusting hyper-parameters?

No, you need to run multiple explorative tests

lissyx on 20 Feb 2019

@Hafsa26 Trial and error honestly.

However, I'd start with parameters near what we have for the release model[[1](https://github.com/mozilla/DeepSpeech/releases/tag/v0.4.1)]

kdavis-mozilla on 20 Feb 2019

Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the hardware used, a server with 8 TitanX Pascal GPUs (12GB of VRAM).

train_files Fisher, LibriSpeech, Switchboard training corpora, as well as a pre-release snapshot of the English Common Voice training corpus.
dev_files LibriSpeech clean and other dev corpora, as well as a pre-release snapshot of the English Common Voice validation corpus.
test_files LibriSpeech clean test corpus
train_batch_size 24
dev_batch_size 48
test_batch_size 48
epoch 30
learning_rate 0.0001
display_step 0
validation_step 1
dropout_rate 0.15
checkpoint_step 1
n_hidden 2048
lm_alpha 0.75
lm_beta 1.85

The weights with the best validation loss were selected at the end of the 30 epochs.

kdavis-mozilla on 20 Feb 2019

Thank you so much! I will update soon about Urdu language model results.

Hafsa26 on 20 Feb 2019

@Hafsa26 Can you share some details about the 300 hours of data you have. As i am also sailing in the same boat, training with 150 hours of data, lets collaborate to bring best of it.

cmhashim on 20 Feb 2019

@cmhashim Have you prepared the demo ? Which version you are working on ?

Hafsa26 on 20 Feb 2019

@cmhashim Have you prepared the demo ? Which version you are working on ?

@Hafsa26 I did the training with 3 hrs of data, but WER was high like your case. The issue is i have used audio of length greater than 10 sec, like 30 sec to 1 min. Hence i am now segmenting the audio to length less than 10 sec.
Version v0.3.0

cmhashim on 20 Feb 2019

@kdavis-mozilla Can't we use audio more than 10 second length ?
or Is there any other to do it for longer audios ?

Hafsa26 on 20 Feb 2019

@cmhashim Have you tried to change the hyper parameters ? Do hit and trial method as suggested by kdavis.

Hafsa26 on 20 Feb 2019

@Hafsa26 If you GPU has the memory, you can train on audio as log as you like. :-)

However, basically all commercial GPU's available today don't have enough memory to train on batches containing audio clips of length 1 min.

kdavis-mozilla on 20 Feb 2019

@Hafsa26 Here is the reply of kdavis to a similar query
I respected that reply and am trying to shorten the train audio files, also its a burden on GPU, takes weeks.

cmhashim on 20 Feb 2019

@cmhashim Thank you!
Let me know if I can help you in any possible way. All the best.

Hafsa26 on 20 Feb 2019

@Hafsa26 any info on 300 hours of data. Is it developed by you or already available?

cmhashim on 20 Feb 2019

Unfortunately, I couldn't get it. @cmhashim

Hafsa26 on 6 Mar 2019

@cmhashim Can you help me in this regard ?

Hafsa26 on 6 Mar 2019

@Hafsa26 I want to know about the urdu corpus you have used. Let me know the details of it. How have you obtained it, since i don't find any publicly available corpus of that size for urdu.

cmhashim on 6 Mar 2019

@Hafsa26 Any updates on Urdu language model trained on 300 hr of data

cmhashim on 28 May 2019

@Hafsa26 how to obtain trie for Urdu language. I can't use the existing one which is built for English. Am i right? @kdavis-mozilla
Yeah i have built lm.binary using Kenlm,
To obtain trie, i need to use command /util/generate_trie alphabet.txt lm.binary vocabulary.txt trie
I can't find generate_trie in util.
@lissyx Can you help me

cmhashim on 31 May 2019

@cmhashim generate_trie is downloaded when one runs

kdavis-19htdh:DeepSpeech kdavis$ python3 util/taskcluster.py --target tc/
Downloading https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.osx/artifacts/public/native_client.tar.xz ...
Downloading: 100%

x generate_trie
x libdeepspeech.so
x LICENSE
x deepspeech
x README.mozilla

See here for more details.

kdavis-mozilla on 31 May 2019

❤1

Download the native client, compatible with your version of deep speech.
It has exe file of generate trie.
Follow the commands, Kdvais mentioned.
All the best.

On Fri, May 31, 2019 at 5:20 PM Kelly Davis notifications@github.com
wrote:

@cmhashim https://github.com/cmhashim generate_trie is downloaded when
one runs

kdavis-19htdh:DeepSpeech kdavis$ python3 util/taskcluster.py --target tc/
Downloading https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.osx/artifacts/public/native_client.tar.xz ...
Downloading: 100%

x generate_trie
x libdeepspeech.so
x LICENSE
x deepspeech
x README.mozilla

See here
https://github.com/mozilla/deepspeech#using-the-command-line-client for
more details.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/mozilla/DeepSpeech/issues/634?email_source=notifications&email_token=AKPSMJHAVAUBJ5NOZLETCK3PYEJZLA5CNFSM4DPKXOE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWVCBRQ#issuecomment-497688774,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKPSMJAXIF36OJ55AJCMS4TPYEJZLANCNFSM4DPKXOEQ
.

Hafsa26 on 31 May 2019

👍1

@cmhashim generate_trie is downloaded when one runs

```shell
kdavis-19htdh:DeepSpeech kdavis$ python3 util/taskcluster.py --target tc/

Thanks @kdavis-mozilla It worked. Usage: ./tc/generate_trie

cmhashim on 31 May 2019

@Hafsa26 Thanks for the info.
Can you state the training command with all flags so that the generated lm are included?

cmhashim on 31 May 2019

Which version, you are working on?

On Fri, May 31, 2019, 7:29 PM cmhashim notifications@github.com wrote:

@Hafsa26 https://github.com/Hafsa26 Thanks for the info.
Can you state the training command with all flags so that the generated lm
are included?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/mozilla/DeepSpeech/issues/634?email_source=notifications&email_token=AKPSMJDD7J7LPKTZXLC7PVTPYEY43A5CNFSM4DPKXOE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWVL35A#issuecomment-497729012,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKPSMJDFQQVDXLPMGV6LW4TPYEY43ANCNFSM4DPKXOEQ
.

Hafsa26 on 31 May 2019

Which version, you are working on?
…

$pip3 list | grep deepspeech
deepspeech-gpu         0.4.1

cmhashim on 1 Jun 2019

I wanted to use this model for urdu language .But I found this in FAQ
''
DeepSpeech's requirements for the data is that the transcripts match the [a-z ]+ regex, and that the audio is stored WAV (PCM) files. ''

How can I design a neural network for speech transcription for languages like urdu ?

Hi, I wanted to do whether you had any success with the Urdu Language Model? I am currently working on Urdu Speech to text for my final year project and would love to get some help and guidance?

areeba97 on 20 Sep 2019

There is complete guide for builting language model of some other language.
I will try to find that link.
You can look for that too.
Follow those instructions then.
Once you build it, you will be fine.
Are you a mastere student or bachelors?

On Fri, Sep 20, 2019, 12:11 PM areeba97 notifications@github.com wrote:

I wanted to use this model for urdu language .But I found this in FAQ
''
DeepSpeech's requirements for the data is that the transcripts match the
[a-z ]+ regex, and that the audio is stored WAV (PCM) files. ''

How can I design a neural network for speech transcription for languages
like urdu ?

Hi, I wanted to do whether you had any success with the Urdu Language
Model? I am currently working on Urdu Speech to text for my final year
project and would love to get some help and guidance?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/mozilla/DeepSpeech/issues/634?email_source=notifications&email_token=AKPSMJFXFDJIABQJORMUGD3QKTYZNA5CNFSM4DPKXOE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7HFVUA#issuecomment-533617360,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKPSMJBPEPYSX7I6HINJSPDQKTYZNANCNFSM4DPKXOEQ
.

Hafsa26 on 20 Sep 2019

I am a bachelors student. yes please, any help would be appreciated. Plus, I also need some guidance for the collection of Urdu audios data

areeba97 on 20 Sep 2019

It's possible now to work in Urdu or any other spoken language supported by UTF-8.

kdavis-mozilla on 10 Jan 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

lock[bot] on 9 Feb 2020

Deepspeech: Use this model for Urdu language

All 79 comments

Hyperparameters for fine-tuning

Related issues