This issue aims to unblock the situation with Hindi.
This is the process to make it possible, and that we used for Portuguese:
hi copy from hi-IN on the locales folder on voice-webhi and the locale is added to Pontoonhi-IN removed in Pontoonhi added via import-locales on voice-web hi-IN removed in voice-web locales folder (and via import-locales) hi sentenceshi locale folder is now on the code via https://github.com/mozilla/voice-web/commit/9ea082c6ec5743daeee97cac1e9ac3431f68c25f
Pontoon already has "hi" locale, this PR is pending to add it to our jsons via import-locales
hi-IN is no longer on pontoon and removed from this repo via 0006ea9ddd5ca5eecd0fc27406a2064c72af3792
@phirework all changes on our side are done, the site should be now loading /hi/
@MichaelKohler let's see if we can get "hi" sentences exported
Thanks everyone!
Pontoon part is complete with metadata migration.
I can verify that the export would now work. At the current stage, 59 sentences would be exported. However I have reason to believe that most of the 29k sentences that are in Sentence Collector are a copyright violation if we add them. Removing those would leave us with around 3 sentences to export.
Source mentioned in the records: Press Information Bureau, Govt of India https://pib.gov.in/indexd.aspx
https://pib.gov.in/content/102_2_Copyright-Policy.aspx seems to require attribution though, if I understand everything correctly. I'll hold off the export for now.
It may be possible to do per-sentence attribution soon but the sentences will have to be processed the way the Europarl sentences were treated, i.e. as a separate export with a separate QA process, instead of being part of the generic sentence-collector.txt. Either way, legal should probably be flagged on this.
@mbransn
I don't think we need legal's time into this if we can solve it ourselves. If the source is the one Michael mentioned it's clearly noted that it's not public domain and we should remove them, as we have done in the past with other sources we have identified as non-public domain.
Agreed & works for me. 馃憤
Deleted the copyright infringing sentences and did an export: https://github.com/mozilla/voice-web/pull/2722 . This leaves us with 3 exported Hindi sentences.
Thanks @MichaelKohler.
Most helpful comment
I don't think we need legal's time into this if we can solve it ourselves. If the source is the one Michael mentioned it's clearly noted that it's not public domain and we should remove them, as we have done in the past with other sources we have identified as non-public domain.