Common-voice: Kabyle locale : Corpus sent

Created on 20 Jun 2018  路  11Comments  路  Source: mozilla/common-voice

We started to send sentences for the kabyle language but we don't know if these sentences are received somewhere. Any information? should we go on with our corpus?


Most helpful comment

@lissyx right, we will pause for now sending the corpora via the form ( https://voice-sprint.mozilla.community/upload/ ) until getting feedback from Michael. In the meantime, we have to select with the the kab team more resources for the corpora before sending them.
I'm just waiting to get technical things unblocked to get more involved with DeepSpeech.

@ftyers I didn't we could do that via a PR!!!!! I can do that. thanks

All 11 comments

@belkacem77 Have you sent them through a PR ? I don't see anything https://github.com/mozilla/voice-web/pulls :-)

@lissyx we sent them via a form we got from Micheal. I'd like to know about the other locales such as french for example. How did they uploaded their copora?
perplexe.

@belkacem77 I guess the form you refer to is the voice sprint one https://voice-sprint.mozilla.community/upload/ ? As much as I remember, it's Michael that was handling that manually, he might be on PTO after the San Francisco All Hands we had last week.

For French, we worked by creating tooling to extract from various dataset, and then this was imported on Crowdin for validation https://crowdin.com/project/common-voice-corpus

Then it's being imported as a PR on this repo. So in your case, what is currently missing is importing the data you sent through the form to Crowdin, I guess.

Maybe we should ping @mikehenrty :-)

@lissyx , yes these are the links I also received. We started to send corpora but still can't know if they are collected somewhere. So We stopped to send them via the form. We are now trying to record out of CommonVoice and use DeepSpeech later to generate the training models locally.

@belkacem77 Well, it depends when you sent it, but people have been rather busy with the launch of Common Voice before the All Hands, then the All Hands, so I'd be not that surprised that Michael is lagging a bit behind ; technically I'm also waiting on him to unblock some things on Crowdin for the french locale :-).

I'm very interested also if you hack on DeepSpeech, you should join us on IRC and / or on Discourse.

@belkacem77 hey there! If you want, you can send me the data and I can make a PR, or you can fork the voice-web repository and make the PR yourself. The ideal way to send the data if you want to send it to me would be to just put the sentences in a pastebin and give me the link.

@lissyx right, we will pause for now sending the corpora via the form ( https://voice-sprint.mozilla.community/upload/ ) until getting feedback from Michael. In the meantime, we have to select with the the kab team more resources for the corpora before sending them.
I'm just waiting to get technical things unblocked to get more involved with DeepSpeech.

@ftyers I didn't we could do that via a PR!!!!! I can do that. thanks

@belkacem77 So, not sure if you got news from Michael, but on my side I got some, and it looks like the most efficient way right now is to open a PR :-)

see #1133

Hi, Thanks Everyone. We will make PR for the next corpora.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ivonnekn picture ivonnekn  路  5Comments

nevik picture nevik  路  5Comments

LucSalommez picture LucSalommez  路  5Comments

r00ster91 picture r00ster91  路  4Comments

psubhashish picture psubhashish  路  5Comments