Azure-docs: What's the maximum limit for Batch Transcription? I mean file size and duration

Created on 18 Apr 2020  Â·  6Comments  Â·  Source: MicrosoftDocs/azure-docs

For example, max file size is 100mb?
max duration is 1 hours?


Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Pri2 cognitive-servicesvc cxp product-question speech-servicsubsvc triaged

Most helpful comment

there are no hard-coded limits in the batch transcription service itself regarding file size or audio duration.

a batch transcription request must be finished withing 48hours (currently), once it has started processing. This includes downloading the audio blob, transcribing, uploading the result data. We are transcribing with up-to double realtime speed. All these parameters are internal and can change, the service usage might be relevant, or available space (memory / disk).

I would recommend staying in a manageable space of several hours of audio. Longer files I would actually split to parallelize the upload of the audio and also the processing of the audio. Splitting 20 hours of audio in 10 segments of 2 hours might get you the transcription results in a couple of hours, as a big file you will have to wait at least 10 hours or so.

All 6 comments

I am using Batch Transcription: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription

I am not using any SDK, just get auth token using HTTP and create transcription task with REST API.

Here is how I do it

  1. Upload audio file to Azure Storage Account's Container
  2. Get a public accessible URL for that file
  3. Send that URL to batch transcription REST API ("Creates a new transcription" POST api/speechtotext/v2.0/transcriptions)

Question

What's the limit? max file size 100mb or? max duration is 1 hour or?

@1c7, thank you for reaching out. We are looking into this and would get back to you soon on this thread.

@1c7 Can you please add more detail about the input audio file's that you are trying.
A standard subscription (S0) for Speech service is required to use batch transcription. Please follow below details for limits.

image

https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-services/
Please follow the below for faq.
https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/faq-stt
The limits above and in the doc are the default limits. We do work closely with large customers and can change the quota when necessary.

Current API_2.0 is only single file based, A version3 of the batch api is going to become available,This API version will allow you to supply a container as the input.

@ram-msft
Hi, the input audio file comes from the user, so it could be any format or size or duration.
I am trying to find out what's the limit,
so my program can say: Sorry, the file is too big/duration is too long

the FAQ https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/faq-stt
didn't say file size limit and duration limit.

image
This is request frequency limit

image
I don't need to train model, I just use the default one

Conclusion: problem still unsolved

Current API_2.0 is only single file based, A version3 of the batch api is going to become available,This API version will allow you to supply a container as the input.

that's great but still haven't answer what's the limit for single file?

there are no hard-coded limits in the batch transcription service itself regarding file size or audio duration.

a batch transcription request must be finished withing 48hours (currently), once it has started processing. This includes downloading the audio blob, transcribing, uploading the result data. We are transcribing with up-to double realtime speed. All these parameters are internal and can change, the service usage might be relevant, or available space (memory / disk).

I would recommend staying in a manageable space of several hours of audio. Longer files I would actually split to parallelize the upload of the audio and also the processing of the audio. Splitting 20 hours of audio in 10 segments of 2 hours might get you the transcription results in a couple of hours, as a big file you will have to wait at least 10 hours or so.

@wolfma61 Thank you :)

Was this page helpful?
0 / 5 - 0 ratings