Deepspeech: Use this model for Urdu language

Created on 15 Jun 2017  Â·  79Comments  Â·  Source: mozilla/DeepSpeech

I wanted to use this model for urdu language .But I found this in FAQ
''
DeepSpeech's requirements for the data is that the transcripts match the [a-z ]+ regex, and that the audio is stored WAV (PCM) files. ''

How can I design a neural network for speech transcription for languages like urdu ?

P4 enhancement

All 79 comments

While we plan to target other languages, we haven't made any decision as to which is the next language to target yet. If you've sufficient speech data for Urdu, thousands of hours of speech, we'd be willing to help in modifying our code for Urdu and lending some server resources for training.

Actually we are trying to make changes in spell.py and text.py for urdu language, and also working for language model in urdu.We have a corpus of urdu on which we will be doing our training.Is this the right approach ?

@MalikMahnoor Sounds about right. (I'd have to see the details to be sure.) How large an Urdu corpus do you have?

700 sentences along with their audios ..but we are using this just to make a prototype..we can even collect more dataset..if this corpus shows good results

Sent from my T-Mobile 4G LTE Device

-------- Original message --------
From: Kelly Davis notifications@github.com
Date:07/11/2017 2:10 PM (GMT+05:00)
To: mozilla/DeepSpeech DeepSpeech@noreply.github.com
Subject: Re: [mozilla/DeepSpeech] Use this model for Urdu language (#634)

@MalikMahnoor Sounds about right. (I'd have to see the details to be sure.) How large an Urdu corpus do you have?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

And by the way your spell.py and text.py is working fine for urdu as well.We have made our language model ,changed the dataset to urdu..The code works fine till the creation of execution context..It gives error on training.The errors to our understanding are because of n_characters (which we have changed too to no of characters in urdu)but there are other errors too.

Sent from my T-Mobile 4G LTE Device

-------- Original message --------
From: Kelly Davis notifications@github.com
Date:07/11/2017 2:10 PM (GMT+05:00)
To: mozilla/DeepSpeech DeepSpeech@noreply.github.com
Subject: Re: [mozilla/DeepSpeech] Use this model for Urdu language (#634)

@MalikMahnoor Sounds about right. (I'd have to see the details to be sure.) How large an Urdu corpus do you have?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Could you post the errors you're getting? Maybe we can help.

We have managed to fix those errors..now it goes in to training..the code works fine now.. but only for isolated words not sentences .We are trying to fix text.py for that.Hopefully we ll be able to do that within a few days

Sent from Yahoo Mail on Android

On Thu, Jul 13, 2017 at 8:27 PM, Kelly Davisnotifications@github.com wrote:
Could you post the errors you're getting? Maybe we can help.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Awesome!

Thanks !

Sent from Yahoo Mail on Android

On Thu, Jul 13, 2017 at 8:35 PM, Kelly Davisnotifications@github.com wrote:
Awesome!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@MalikMahnoor When you get an Urdu model up and running and want to distribute it to the world, we'd be happy to help host the model for you. Providing, say, S3 storage so others can download the model.

Hi @MalikMahnoor I am also working on Urdu Speech Recognition but using a different approach. I have already tried single speaker 700 sentences corpus recorded by Agha Ali. It is not useful corpus and now planning to add data from new sources. We can collaborate. thanks

@abbasrazaali @MalikMahnoor I would suggest you take a look also at Common Voice, they are working on localization and internationalization, that would help you augment the corpus.

@kdavis-mozilla Are there any specific requirements for audio recordings you need? What if, we provide you, thousands hours of recordings of Urdu TV/radio. Please specify, if there are any such requirements. Can you please also explain, what type of code changes are needed for accomplishing Urdu support?

@sajjadsaleem I don't know if there are hard an fast _requirements_. However, there are some things which we have found to work.

  • Audio recorded at least at 16Khz 16bit or more and in mono (stereo _may_ work too)
  • Audio fed to the system at 16Khz 16bit mono
  • Text "normalized" so that the transcription corresponds to exactly what is in the audio
  • Audio segmented into segments of about 1-15 seconds before being fed to the system
  • Audio need _not_ be perfectly without noise
  • Audio should reflect the environment in which the system is to be used (If used in noisy area, the audio should be noisy too so the system learns to deal with noise.)

As for supporting Urdu you'll need to make changes similar to those required for French support which is described here[1] or German described here[2].

hi its me sehar gul
deep speech is new for me i have to train it for urdu language can u help me how to train it for urdu language??

@sehargul A good start is the discourse post[1]; further discussion can be had there.

Any updates on the Urdu model?

Hi. I couldn't find spell.py file in DeepSpeech Master - Version 0.2.0 alpha 0. what could be the substitute of it ?
Thank you!

@kdavis-mozilla, can you please answer my query? What could be substitute of spell.py file in Deepspeech master Version 0.2.0 alpha0.
Thank you!

@Hafsa26 There have been a lot of changes since spell.py was in the repo. Could you say a little more about what you want to do?

@Hafsa26 There have been a lot of changes since spell.py was in the repo. Could you say a little more about what you want to do?

I am working on Urdu Language Speech Recognition system using DeepSpeech. As you said above, we need to make changes in text.py and spell.py for it. I found text.py in repo but couldn't find spell.py.
So what could be the solution for it? Secondly, if you have any blog or help for speech recognition system of some other language using Deepspeech. Kindly please share. Thank you!

@Hafsa26 I guess I'm looking more towards: What your goal? spell.py is no longer in the repo, but the functionality it provided is. So, I need to know what functionality you are trying to use so I can point you in the right direction.

@MalikMahnoor Dear what is the status of your work on Urdu language model ? can you share ?

@kdavis-mozilla I want to create my own language model based on Urdu language. Can you please help me in this matter ? I've collected approximately 9000 audio recorded files in Urdu voice of 100 different sentences. Currently i am training this data with Roman transcription but i want to train it with Urdu transcription.

@kdavis-mozilla I want to create my own language model based on Urdu language. Can you please help me in this matter ? I've collected approximately 9000 audio recorded files in Urdu voice of 100 different sentences. Currently i am training this data with Roman transcription but i want to train it with Urdu transcription.

What's wrong in the current documentation ? There should be everything documented for you to achieve that.

@lissyx can you please elaborate which documentation you are talking about? or share that documentation here. As I've never found any for languages other than English

What about README.md ? I really don't understand what's blocking you.

@lissyx - the README.md has all the info needed, but I will admit it's hard to pick it out for newcomers... maybe it's time to write a blogpost for "how to train DeepSpeech on a new language"?

@lissyx - the README.md has all the info needed, but I will admit it's hard to pick it out for newcomers... maybe it's time to write a blogpost for "how to train DeepSpeech on a new language"?

Maybe, but again, if we don't know the pain points, it's less efficient. If you ask me, it's trivial and all properly documented. Obviously it's not the case, and thus I'm unsure I can produce anything more useful than the existent documentation.

I've been running into all the pain points getting DS to work with all the CV langs, so I definite could write up that post... I'm just concerned about how much time it would take - a week or so I'd guess.

When I finish the Windows parts I'll start working on it for Spanish, @JRMeyer I can share with you the "hardest parts" if you want.

@lissyx - the README.md has all the info needed, but I will admit it's hard to pick it out for newcomers... maybe it's time to write a blogpost for "how to train DeepSpeech on a new language"?

it would be very helpful indeed.

What about README.md ? I really don't understand what's blocking you.

I just need to know does DeepSpeech supports RTL transcription like Arabic and Urdu ?

@waqasr6 I know developers outside of Mozilla have used it for Urdu, but we at Mozilla have never used it for such.

I just need to know does DeepSpeech supports RTL transcription like Arabic and Urdu ?

What kind of constraints do you have in mind ? We have support for UTF-8 so chars should be handled properly, and then RTL should not be a problem since this is how training will be done

@lissyx Thanks. Many things in my mind are cleared now. I'll try it with Urdu language model now.

@lissyx Hi, How to convert output_graph.pb model into .pbmm model ?
I got my Urdu language model with .pb extension. Is there any way to convert into .pbmm ?

Thank you!

@Hafsa26 Have you read README.md ?

I did. to check the model, I need output_graph.pbmm but I got output_graph.pb
Do I need to make some changes to get .pbmm graph rather than .pb graph.

I think what lissyx is referring to is this.

Thank you so much!

Do you mind sharing figures on how well your model performs? You also might want to export it to tflite format for Android support.

@lissyx yes, I would surely share soon. Up till now, I worked on 1 hour of data and the system is working fine. Though, I am getting 100% WER yet but I will tweak the model once I started working on 300 hours data. I initially have to prepare demo of DeepSpeech for Urdu Language.

If there is anything you can share to make it better, I would love to know.

I am not planning to use it on Android yet but I need, I will surely do it. Thank you for helping all the way.

image

Please avoid images

When I trained model for one hour, loss is gradually decreasing but after 14 epochs, its increasing for some epochs and decreasing for some epochs.
What do you suggest in such scenario?

When I trained model for one hour, loss is gradually decreasing but after 14 epochs, its increasing for some epochs and decreasing for some epochs.
What do you suggest in such scenario?

Not surprising with only one hour, nothing to conclude. You will have to adjust hyper-parameters, eventually, anyway.

I will. I will be using 300 hours of data next then I will be adjusting hyper-parameters accordingly.
Is there any guide for adjusting hyper-parameters?

I will. I will be using 300 hours of data next then I will be adjusting hyper-parameters accordingly.
Is there any guide for adjusting hyper-parameters?

No, you need to run multiple explorative tests

@Hafsa26 Trial and error honestly.

However, I'd start with parameters near what we have for the release model[[1](https://github.com/mozilla/DeepSpeech/releases/tag/v0.4.1)]

Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the hardware used, a server with 8 TitanX Pascal GPUs (12GB of VRAM).

  • train_files Fisher, LibriSpeech, Switchboard training corpora, as well as a pre-release snapshot of the English Common Voice training corpus.
  • dev_files LibriSpeech clean and other dev corpora, as well as a pre-release snapshot of the English Common Voice validation corpus.
  • test_files LibriSpeech clean test corpus
  • train_batch_size 24
  • dev_batch_size 48
  • test_batch_size 48
  • epoch 30
  • learning_rate 0.0001
  • display_step 0
  • validation_step 1
  • dropout_rate 0.15
  • checkpoint_step 1
  • n_hidden 2048
  • lm_alpha 0.75
  • lm_beta 1.85

The weights with the best validation loss were selected at the end of the 30 epochs.

Thank you so much! I will update soon about Urdu language model results.

@Hafsa26 Can you share some details about the 300 hours of data you have. As i am also sailing in the same boat, training with 150 hours of data, lets collaborate to bring best of it.

@cmhashim Have you prepared the demo ? Which version you are working on ?

@cmhashim Have you prepared the demo ? Which version you are working on ?

@Hafsa26 I did the training with 3 hrs of data, but WER was high like your case. The issue is i have used audio of length greater than 10 sec, like 30 sec to 1 min. Hence i am now segmenting the audio to length less than 10 sec.
Version v0.3.0

@kdavis-mozilla Can't we use audio more than 10 second length ?
or Is there any other to do it for longer audios ?

@cmhashim Have you tried to change the hyper parameters ? Do hit and trial method as suggested by kdavis.

@Hafsa26 If you GPU has the memory, you can train on audio as log as you like. :-)

However, basically all commercial GPU's available today don't have enough memory to train on batches containing audio clips of length 1 min.

@Hafsa26 Here is the reply of kdavis to a similar query
I respected that reply and am trying to shorten the train audio files, also its a burden on GPU, takes weeks.

@cmhashim Thank you!
Let me know if I can help you in any possible way. All the best.

@Hafsa26 any info on 300 hours of data. Is it developed by you or already available?

Unfortunately, I couldn't get it. @cmhashim

@cmhashim Can you help me in this regard ?

@Hafsa26 I want to know about the urdu corpus you have used. Let me know the details of it. How have you obtained it, since i don't find any publicly available corpus of that size for urdu.

@Hafsa26 Any updates on Urdu language model trained on 300 hr of data

@Hafsa26 how to obtain trie for Urdu language. I can't use the existing one which is built for English. Am i right? @kdavis-mozilla
Yeah i have built lm.binary using Kenlm,
To obtain trie, i need to use command /util/generate_trie alphabet.txt lm.binary vocabulary.txt trie
I can't find generate_trie in util.
@lissyx Can you help me

@cmhashim generate_trie is downloaded when one runs

kdavis-19htdh:DeepSpeech kdavis$ python3 util/taskcluster.py --target tc/
Downloading https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.osx/artifacts/public/native_client.tar.xz ...
Downloading: 100%

x generate_trie
x libdeepspeech.so
x LICENSE
x deepspeech
x README.mozilla

See here for more details.

Download the native client, compatible with your version of deep speech.
It has exe file of generate trie.
Follow the commands, Kdvais mentioned.
All the best.

On Fri, May 31, 2019 at 5:20 PM Kelly Davis notifications@github.com
wrote:

@cmhashim https://github.com/cmhashim generate_trie is downloaded when
one runs

kdavis-19htdh:DeepSpeech kdavis$ python3 util/taskcluster.py --target tc/
Downloading https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.osx/artifacts/public/native_client.tar.xz ...
Downloading: 100%

x generate_trie
x libdeepspeech.so
x LICENSE
x deepspeech
x README.mozilla

See here
https://github.com/mozilla/deepspeech#using-the-command-line-client for
more details.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/mozilla/DeepSpeech/issues/634?email_source=notifications&email_token=AKPSMJHAVAUBJ5NOZLETCK3PYEJZLA5CNFSM4DPKXOE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWVCBRQ#issuecomment-497688774,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKPSMJAXIF36OJ55AJCMS4TPYEJZLANCNFSM4DPKXOEQ
.

@cmhashim generate_trie is downloaded when one runs

```shell
kdavis-19htdh:DeepSpeech kdavis$ python3 util/taskcluster.py --target tc/

Thanks @kdavis-mozilla It worked. Usage: ./tc/generate_trie

@Hafsa26 Thanks for the info.
Can you state the training command with all flags so that the generated lm are included?

Which version, you are working on?

On Fri, May 31, 2019, 7:29 PM cmhashim notifications@github.com wrote:

@Hafsa26 https://github.com/Hafsa26 Thanks for the info.
Can you state the training command with all flags so that the generated lm
are included?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/mozilla/DeepSpeech/issues/634?email_source=notifications&email_token=AKPSMJDD7J7LPKTZXLC7PVTPYEY43A5CNFSM4DPKXOE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWVL35A#issuecomment-497729012,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKPSMJDFQQVDXLPMGV6LW4TPYEY43ANCNFSM4DPKXOEQ
.

Which version, you are working on?
…

$pip3 list | grep deepspeech
deepspeech-gpu         0.4.1 

I wanted to use this model for urdu language .But I found this in FAQ
''
DeepSpeech's requirements for the data is that the transcripts match the [a-z ]+ regex, and that the audio is stored WAV (PCM) files. ''

How can I design a neural network for speech transcription for languages like urdu ?

Hi, I wanted to do whether you had any success with the Urdu Language Model? I am currently working on Urdu Speech to text for my final year project and would love to get some help and guidance?

There is complete guide for builting language model of some other language.
I will try to find that link.
You can look for that too.
Follow those instructions then.
Once you build it, you will be fine.
Are you a mastere student or bachelors?

On Fri, Sep 20, 2019, 12:11 PM areeba97 notifications@github.com wrote:

I wanted to use this model for urdu language .But I found this in FAQ
''
DeepSpeech's requirements for the data is that the transcripts match the
[a-z ]+ regex, and that the audio is stored WAV (PCM) files. ''

How can I design a neural network for speech transcription for languages
like urdu ?

Hi, I wanted to do whether you had any success with the Urdu Language
Model? I am currently working on Urdu Speech to text for my final year
project and would love to get some help and guidance?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/mozilla/DeepSpeech/issues/634?email_source=notifications&email_token=AKPSMJFXFDJIABQJORMUGD3QKTYZNA5CNFSM4DPKXOE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7HFVUA#issuecomment-533617360,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKPSMJBPEPYSX7I6HINJSPDQKTYZNANCNFSM4DPKXOEQ
.

I am a bachelors student. yes please, any help would be appreciated. Plus, I also need some guidance for the collection of Urdu audios data

It's possible now to work in Urdu or any other spoken language supported by UTF-8.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings