Hey there!
Thanks for the work you are doing to make labelled audio datasets more widely available for training, under public domain. Unfortunately, while I see some language expressing disappointment in the big tech companies in keeping their tech to themselves, I don't see credits to organizations whose tech is being used to do the training for STT in the background. Someone on Hacker News mentioned it is a TensorFlow implementation based on DeepSpeech.
If it is, (a) credit big corporation 1: Google, (b) credit big corporation 2: Baidu. You can't stand on the shoulders of giants and then dis them.
If all the tech is made my Mozilla from scratch, I apologize, you don't need to provide any credit.
Varun
Hi @varunarora, thanks for your feedback here.
I think there are two different concepts here that I want to separate:
Common Voice is focusing squarely on number 2, which indeed has (so far) not been aided by Google or Baidu, even though they both possess this data. We are not calling them bad or evil (as indeed putting data like this in the public domain has lots of legal and technical challenges), we are just calling out the situation like we see it: "locked up".
Perhaps the criticism here is around the language we use one our homepage, since it doesn't make it clear what "technology" is "locked-up" in big corporations. In which case, you have a point and we can examine changing the language there.
Closing this bug since there is not too much actionable here.
Oh I was excited about the latter bit of your response - which is examining changing the language there to make it clear what technology is locked up. Super actionable! And yes, you are right, this initiative is all about the data, the term "technology" isn't even the best choice of word here. Could this please be considered?
Negative rhetoric isn't the only way to get volunteers excited, Mike :) I feel sad that Mozilla. in general, has to resort to it (eg. https://www.alexkras.com/firefox-launches-marketing-offensive-against-chrome/). The technology community is the furthest from indoctrination and brain-washing; we don't need to be consumers of such language.
@varunarora, I proposed some language changes in #388, what do you think of it?
@Omniscimus I think that you language is wonderful! More accurate and respectful, yet gets the point about "lack of access" across. Thank you very much :+1:
Thanks for all your help @varunarora and @Omniscimus. Those updates look succinct and do a great job of clarifying some of our language. Merging 馃憤
Most helpful comment
@Omniscimus I think that you language is wonderful! More accurate and respectful, yet gets the point about "lack of access" across. Thank you very much :+1: