Azure-docs: batch transcription in python

Created on 21 Jan 2019 · 13Comments · Source: MicrosoftDocs/azure-docs

Can you give some samples on how to use batch transcription with python.

Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

ID: 73fb7e76-fb80-c420-2efe-e3ca3a27b749
Version Independent ID: 07a42557-deb3-e4b3-3f14-f26a27be49c2
Content: How to use Batch Transcription - Speech Services
Content Source: articles/cognitive-services/Speech-Service/batch-transcription.md
Service: cognitive-services
GitHub Login: @PanosPeriorellis
Microsoft Alias: panosper

Pri2 assigned-to-author cognitive-servicesvc doc-enhancement speech-servicsubsvc triaged

Source

kiranmahto

👍5

Most helpful comment

I am incredibly pissed of with microsofts documentation and usability on this.

I had the same issue re: invalid swagger_client python code
why does the normal azure api only support wav? I can see no possible reason for this
why does microsoft need so many domains (cris.ai, videoindexer.ai, etc.) which are just making life harder re: cookies, etc.
then I got HTTP response body: {"code":"InvalidProductId","message":"The subscription SKU \"CognitiveServices.S0\" is not supported in this service instance."}
then I upgraded my free azure account to a pay-as-you-go
it took ages for azure to realize that my subscription had been updated
Now I'm getting HTTP response body: {"code":"Unauthorized","message":"Authentication is required to access the resource."}

Dear Microsoft, the year is 2019 and you are desperately trying to compete with AWS, everyone else just provides an sdk that works. I have never, ever had this many unnecessary problems with AWS. Please provide documentation and sane ways of interacting with your services. It's honestly not that hard. Absolutely shocking!

codinguncut on 7 Jun 2019

👍4

All 13 comments

@kiranmahto Thank you for your interest in Azure products and services. This is being assigned to the content author to have a look and update as appropriate.

CHEEKATLAPRADEEP-MSFT on 21 Jan 2019

This will be very helpful

kishan19 on 3 Apr 2019

Whenever I am told to use azure with something other than .NET it doesn't work or causes a lot of problems.
I think that when you publish an API, you should have examples and documentation.
I don't understand how a company like Microsoft leaves aside the rest of the world that has to use languages like python, even more in the field of data science.

deivit78 on 15 Apr 2019

👍1

There are details of a Python sample being uploaded today. We are also uploading related documentation on how to automatically generate client libraries using our Swagger docs.

PanosPeriorellis on 15 Apr 2019

Testing the python script:
Connected to pydev debugger (build 191.6605.12)
Traceback (most recent call last):
File "/home/dave/programas/pycharm-2019.1.1/helpers/pydev/pydevd.py", line 1741, in
main()
File "/home/dave/programas/pycharm-2019.1.1/helpers/pydev/pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/dave/programas/pycharm-2019.1.1/helpers/pydev/pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/dave/programas/pycharm-2019.1.1/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/dave/PycharmProjects/azure/main.py", line 8, in
import swagger_client as cris_client
File "/home/dave/PycharmProjects/googleTest/azure/lib/python3.6/site-packages/swagger_client/__init__.py", line 19, in
from swagger_client.api.custom_speech_accuracy_tests_api import CustomSpeechAccuracyTestsApi
File "/home/dave/PycharmProjects/googleTest/azure/lib/python3.6/site-packages/swagger_client/api/__init__.py", line 6, in
from swagger_client.api.custom_speech_accuracy_tests_api import CustomSpeechAccuracyTestsApi
File "/home/dave/PycharmProjects/googleTest/azure/lib/python3.6/site-packages/swagger_client/api/custom_speech_accuracy_tests_api.py", line 21, in
from swagger_client.api_client import ApiClient
File "/home/dave/PycharmProjects/googleTest/azure/lib/python3.6/site-packages/swagger_client/api_client.py", line 27, in
import swagger_client.models
File "/home/dave/PycharmProjects/googleTest/azure/lib/python3.6/site-packages/swagger_client/models/__init__.py", line 23, in
from swagger_client.models.endpoint import Endpoint
File "/home/dave/PycharmProjects/googleTest/azure/lib/python3.6/site-packages/swagger_client/models/endpoint.py", line 19, in
from swagger_client.models.model import Model # noqa: F401,E501
File "/home/dave/PycharmProjects/googleTest/azure/lib/python3.6/site-packages/swagger_client/models/model.py", line 20, in
from swagger_client.models.model import Model # noqa: F401,E501
ImportError: cannot import name 'Model'

deivit78 on 16 Apr 2019

I don't know if the problem is:
from swagger_client.models.model import Model # noqa: F401,E501
class Model(object):

It seems ambiguous.

deivit78 on 16 Apr 2019

@deivit78 I also experienced the same error listed above with the Python sample. There is clearly some abstraction done because the JSON response to the API is fairly complex, and actually requires multiple calls, but you can simply use Python requests to write request like below.

POST https://<your-speech-region>.cris.ai/api/speechtotext/v2.0/Transcriptions

Headers
{
"Ocp-Apim-Subscription-Key": "your-subscription-key"
"Content-Type": "application/json"
}

Body
{
  "recordingsUrl": "a generated SAS URL to your audio file in an Azure Storage Account ",
  "models": [],
  "locale": "en-US",
  "name": "Transcription using locale en-US",
  "description": "An optional description of the transcription.",
  "properties": {
    "ProfanityFilterMode": "Masked",
    "PunctuationMode": "DictatedAndAutomatic"
  }
}

You can add {"id": "<your-custom-language-model-id>"} into the models array in the body of the payload if you want to use a custom model, otherwise it will use the most up to date unified speech model.

The response body to that request will contain a Location which is a URL you can do a GET against and that is where your transcripts will be, but the Retry-After in the response is the number of seconds you must wait before checking the location (according to the docs) as the transcription can take time.

JamesEarle on 16 May 2019

I am incredibly pissed of with microsofts documentation and usability on this.

I had the same issue re: invalid swagger_client python code
why does the normal azure api only support wav? I can see no possible reason for this
why does microsoft need so many domains (cris.ai, videoindexer.ai, etc.) which are just making life harder re: cookies, etc.
then I got HTTP response body: {"code":"InvalidProductId","message":"The subscription SKU \"CognitiveServices.S0\" is not supported in this service instance."}
then I upgraded my free azure account to a pay-as-you-go
it took ages for azure to realize that my subscription had been updated
Now I'm getting HTTP response body: {"code":"Unauthorized","message":"Authentication is required to access the resource."}

codinguncut on 7 Jun 2019

👍4

I'm also very frustrated by this, I'd like to be able to run speech analysis on audio files but the documentation is nigh-incomprehensible. Honestly I'd use a different service but this project requires me to use Azure so I don't really have a choice.

theelk801 on 10 Jun 2019

👍1

Sorry that you are experiencing problems. We have updated the sample code and instructions with workarounds for a swagger bug (import Model failed) and a more detailed instructions. For the authentication problem, make sure that the region given for the endpoint in your code and the region you used to download the swagger client are the same. If you continue to have problems with the sample, consider opening a issue in the samples repo.

chlandsi on 11 Jun 2019

Please add the label speech-service/subsvc to this issue.
Thank you.

tchristiani on 17 Jul 2019

Sample has been updated to address this issue.

please-close

tchristiani on 1 Aug 2019

Agreed that this is ridiculous, why has it taken me over half a day to work out how to get this service to do what I want? There are some code snippets that show how to make it work for mic input under 15 seconds long. That's it. MS have forgotten to document their own product. What exactly was the plan here? Bizarre and annoying.