Common-voice: [nl] Create initial dataset of 500 Dutch sentences

Created on 11 Aug 2018  路  15Comments  路  Source: mozilla/common-voice

I would love to get the Dutch language going for this project. Like @highsource I have made a fork and will be making a list of sentences.

Most of the sentences will be translated from the English list, but I will also try to make some originals ;)

Most helpful comment

We are live on the website with Dutch, so I can close this issue :smile:

All 15 comments

Hey @pcmill how is it going?

I wonder, what is your method to gather/create sentences?

I mostly (roughly) translate sentences from English. I made a list of categories and kind of filled them:

  • General
  • Foods and Drinks
  • Months and Times
  • Medical
  • Nature
  • Religion
  • Social
  • Sports
  • Proverbs
  • Technologie
  • Business

My sentences got merged this day actually, but I think I can add some more sentences that reference more locations in Belgium since they also speak Dutch.

@pcmill today also another big merge was done for Dutch sentences, also in collaboration with Sjoerd. There I also included many locations, but the more the better.

Just saw that the limit has gone up from 500 sentences to 5000. Therefore we need an additional 1500 for the launch. @pcmill @sroet, interested in setting up a different collaboration channel?

Sure, sounds like a good idea.

@danielsjf Where did you saw about the 5k limit?

If you go to the website, go to languages, click on the tab progress and you will see it. Under Dutch, it says 3600/5000.

@danielsjf Ok, I see, thanks. Russian is at 0 at the moment, I'm working on a fork, wasn't merged yet.

But it good to know which volume is required. Wasn't apparent for me before.

@danielsjf @pcmill Sorry, I will not be able to contribute constructively anytime soon. (currently in the middle of moving countries) I should be able to proofread, if required.

No problem @sroet, I will try my best to come up with some more sentences. You already made most of them anyway.

We are live on the website with Dutch, so I can close this issue :smile:

Finally! Thanks for the great work guys!

Awesome stuff y'all!

Submitted the news on Tweakers and now we are already at 2h30 recorded and 130 speakers ;-)

Dutch is one of our fastest growing datasets right now! Great work everyone!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mikehenrty picture mikehenrty  路  3Comments

nevik picture nevik  路  5Comments

r00ster91 picture r00ster91  路  4Comments

mbebenita picture mbebenita  路  3Comments

selimsumlu picture selimsumlu  路  3Comments