Common-voice: Credit where credit is due

Created on 18 Jul 2017 · 7Comments · Source: mozilla/common-voice

Hey there!

Thanks for the work you are doing to make labelled audio datasets more widely available for training, under public domain. Unfortunately, while I see some language expressing disappointment in the big tech companies in keeping their tech to themselves, I don't see credits to organizations whose tech is being used to do the training for STT in the background. Someone on Hacker News mentioned it is a TensorFlow implementation based on DeepSpeech.

If it is, (a) credit big corporation 1: Google, (b) credit big corporation 2: Baidu. You can't stand on the shoulders of giants and then dis them.

If all the tech is made my Mozilla from scratch, I apologize, you don't need to provide any credit.

Varun

Source

varunarora

👎1

Most helpful comment

@Omniscimus I think that you language is wonderful! More accurate and respectful, yet gets the point about "lack of access" across. Thank you very much :+1:

varunarora on 1 Aug 2017

❤2

All 7 comments

Hi @varunarora, thanks for your feedback here.

I think there are two different concepts here that I want to separate:

Machine Learning algorithms (eg. RNN) and Tools (eg. Tensor Flow)
The data used to train the algorithm (in this case labelled audio clips)

Common Voice is focusing squarely on number 2, which indeed has (so far) not been aided by Google or Baidu, even though they both possess this data. We are not calling them bad or evil (as indeed putting data like this in the public domain has lots of legal and technical challenges), we are just calling out the situation like we see it: "locked up".

Perhaps the criticism here is around the language we use one our homepage, since it doesn't make it clear what "technology" is "locked-up" in big corporations. In which case, you have a point and we can examine changing the language there.

mikehenrty on 18 Jul 2017

Closing this bug since there is not too much actionable here.

mikehenrty on 28 Jul 2017

Oh I was excited about the latter bit of your response - which is examining changing the language there to make it clear what technology is locked up. Super actionable! And yes, you are right, this initiative is all about the data, the term "technology" isn't even the best choice of word here. Could this please be considered?

Negative rhetoric isn't the only way to get volunteers excited, Mike :) I feel sad that Mozilla. in general, has to resort to it (eg. https://www.alexkras.com/firefox-launches-marketing-offensive-against-chrome/). The technology community is the furthest from indoctrination and brain-washing; we don't need to be consumers of such language.

varunarora on 30 Jul 2017

👍1

@varunarora, I proposed some language changes in #388, what do you think of it?

Omniscimus on 30 Jul 2017

👍1

@Omniscimus I think that you language is wonderful! More accurate and respectful, yet gets the point about "lack of access" across. Thank you very much :+1:

varunarora on 1 Aug 2017

❤2

Thanks for all your help @varunarora and @Omniscimus. Those updates look succinct and do a great job of clarifying some of our language. Merging 👍

mikehenrty on 10 Aug 2017

https://github.com/mozilla/voice-web/commit/45fa2c7777309160b165bf4ba81439c43bc6dba0

mikehenrty on 10 Aug 2017

Was this page helpful?

0 / 5 - 0 ratings