Google-cloud-python: Speech: long audio files produce error

Created on 26 Dec 2018 · 11Comments · Source: googleapis/google-cloud-python

I got an error message None Unexpected state: Long-running operation had neither response nor error set. when I tried using long_running_recognize() function for long audio files (> 1h). Some files worked correctly, but most of the long files gave me the error message. I tried a 1m 30s long audio file using the same function, then it works. But when I tried more than 1 hour files, it mostly failed.

Environment details

API - Speech
OS - macOS
Python version

$ python3 --version
Python 3.7.0

google-cloud-speech version

$ pip3 show google-cloud-speech
Name: google-cloud-speech
Version: 0.36.0

Description

I am making a word detection python script for counting the occurrence of target Korean word.
The input audio files are recorded using the same mobile app, and are continuously stored in the cloud storage.
All files have exactly same length (6 hours) and the same encoding/file format.

For instance, I have two 6 hour raw audio files:

A.wav
B.wav

Both of the files have same encoding options:

Sample rate: 44100Hz
Channel: Mono
Encoding: Raw (16-bit signed, little endian)
Length: Exactly 6 hours (= 360 min = 360 * 60 sec)

long_running_recognize() function in python (and gcloud ml speech recognize-long-running in gcloud command) in Google Cloud Speech API only supports maximum 3 hours length, I cut those files into 1, 2, 3 hours using Audacity:

A_1h.wav
A_2h.wav
A_3h.wav
B_1h.wav
B_2h.wav
B_3h.wav

Steps to reproduce

Not clear, but in my case, most of long audio files (> 1h) which include non-English conversation produce the errors.

Code example

I ran the following python code, and the 5 files except A_2h.wav give me a same error. A_2h.wav works properly, and returns transcript and confidence values.

import os
import sys
import json
import threading
import time

def transcribe_gcs(gcs_uri, hint):
    """Asynchronously transcribes the audio file specified by the gcs_uri."""
    from google.cloud import speech
    from google.cloud.speech import enums
    from google.cloud.speech import types
    client = speech.SpeechClient()

    audio = types.RecognitionAudio(uri=gcs_uri)
    config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
        # encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
        sample_rate_hertz=44100,
        language_code='ko-KR',
        enable_word_time_offsets=True,
        speech_contexts=[speech.types.SpeechContext(
            phrases=hint,
        )])

    initial_time = time.time()
    try:
        operation = client.long_running_recognize(config, audio)

        print('Waiting for operation to complete...')
        response = operation.result(timeout=1200)

        with open('test.txt', 'w') as audacity_label:
            for result in response.results:
                # The first alternative is the most likely one for this portion.
                transcript = result.alternatives[0].transcript
                confidence = result.alternatives[0].confidence
                words = result.alternatives[0].words
                for word in words:
                    if word.word in hint:
                        start_time = word.start_time.seconds + (word.start_time.nanos / 1000000000)
                        end_time = word.end_time.seconds + (word.end_time.nanos / 1000000000)
                        print(u'{} - {}: {}'.format(start_time, end_time, word.word)),
                        print(u'Transcript: {}'.format(result.alternatives[0].transcript))
                        print('Confidence: {}'.format(result.alternatives[0].confidence))
                        audacity_label.write(u'{}\t{}\t{}\n'.format(start_time, end_time, confidence))
                        break
    except Exception as e:
        print('Error occurred!', e)
    finally:
        print('Total time: %f second' % float(time.time() - initial_time))

gcs_uri = sys.argv[1]
hint = [u'테스트', u'안녕'] # Korean hints

t = threading.Thread(target=transcribe_gcs, args=(gcs_uri, hint))
t.start()
print('Thread started')

Stack trace (error message)

$ python3 googlecloud-speech.py gs://BUCKET_NAME/A_1h.wav
Thread started
Waiting for operation to complete...
Error occurred! None Unexpected state: Long-running operation had neither response nor error set.
Total time: 415.695342 second

I think None Unexpected state: Long-running operation had neither response nor error set. is the key error message.

Additional information

Same file always returns the same result. (A_2h.wav always returns valid result without errors, and other files always fail.)
I tried other encoding and file format .flac. I converted the encoding and file format using sox, and checked the files are played correctly, but I got the same error message.
I also tried .raw file format (PCM data without wav file header), but the same error.

I also tried using gcloud command and also failed:

$ gcloud ml speech recognize-long-running 'gs://BUCKET_NAME/A_1h.wav' --language-code='ko-KR' --sample-rate=44100 --async
Check operation [2383096597783330461] for status.
{
  "name": "2383096597783330461"
}

$ gcloud ml speech operations describe 2383096597783330461
{
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
    "lastUpdateTime": "2018-12-26T05:21:51.609280Z",
    "progressPercent": 15,
    "startTime": "2018-12-26T05:20:49.633783Z"
  },
  "name": "2383096597783330461"
}

Wait for everything done, and got final messages:

$ gcloud ml speech operations wait 2383096597783330461
Waiting for operation [2383096597783330461] to complete...done.
[]

The detail information is:

$ gcloud ml speech operations describe 2383096597783330461
{
  "done": true,
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
    "lastUpdateTime": "2018-12-26T05:27:17.214045Z",
    "progressPercent": 100,
    "startTime": "2018-12-26T05:20:49.633783Z"
  },
  "name": "2383096597783330461"
}

I think that one of error or response field must be available because "done": true according to https://cloud.google.com/speech-to-text/docs/reference/rpc/google.longrunning, but no field is set.

I also checked following issues:

But none of them is helpful for me.

question backend speech

Source

ipuris

Most helpful comment

Thanks for the continued input everyone! Just confirmed that the engineering team is now aware of the issue and I will continue to pass along all of the comments here to help get this prioritized as best I can!

beccasaurus on 14 Jan 2019

👍4

All 11 comments

@ipuris Have you looked at the limits page for the Speech API?

tseaver on 3 Jan 2019

Sorry for late response. I didn't carefully had checked the limits page, so I tested the limits using multiple files during couple of days.
In short, I think limits may not the reason of the problem.

Among sound files I tested, a sound file (I set this file gs://deeply-test/SUB1_11_12_1h_FAILED.wav publicly available in https://storage.googleapis.com/deeply-test/SUB1_11_12_1h_FAILED.wav) which records 1 hour of sound in the empty room, so the sound file contains any speech but just noise sound.
To avoid daily request limits, I also tested the file after the 24 hours later of the previous request.
But the Speech API returns the same error.

$ python3 googlecloud-speech.py gs://deeply-test/SUB1_11_12_1h_FAILED.wav
Thread started
Waiting for operation to complete...
Error occurred! None Unexpected state: Long-running operation had neither response nor error set.
Total time: 405.020983 second

For more information, this is a file information of the file.

$ sox --i SUB1_11_12_1h_FAILED.wav

Input File     : 'SUB1_11_12_1h_FAILED.wav'
Channels       : 1
Sample Rate    : 44100
Precision      : 16-bit
Duration       : 01:00:00.00 = 158760000 samples = 270000 CDDA sectors
File Size      : 318M
Bit Rate       : 706k
Sample Encoding: 16-bit Signed Integer PCM

I tested more files which contain or don't contain conversations, but the existence of the conversations doesn't look like a reason of the error. Unfortunately, the files contain privacy-sensitive conversations of our beta testers, so I can't provide more files.

By the way, I think in case of the limits error, it would be better to provide the explicit reason with error message.

Thank you for your help.

ipuris on 9 Jan 2019

@beccasaurus Can you please comment?

tseaver on 9 Jan 2019

@tseaver Reaching out to the Speech team...

beccasaurus on 10 Jan 2019

Note that I've seen the "Long-running operation had neither response nor error set." symptom from long_running_recognize when:

The file was uploaded via e.g.blob=bucket.blob(blobname); blob.upload_from_file(..)
The blob already existed within the bucket, and upload_from_file was replacing what existed
rather than creating a new.. thing.
The bucket has an expiration policy set up.

I haven't seen this particular error since I started using "fresh" blobs for anything I upload into gcs instead of uploading on top of them. It may be a completely different error, but just FYI.

mcdonc on 10 Jan 2019

Team may/may not know about scenarios when Operation has neither it's .response or .error set when it's .done. Hoping they find this report useful. Will update if I hear anything back

AFAIK when a long-running operation is .done, one of those fields should be set

beccasaurus on 10 Jan 2019

Created an external tracking ticket, because this isn't related to the Python client library (this repository and its issues)

https://issuetracker.google.com/issues/122732566

beccasaurus on 11 Jan 2019

@jerjou This is still a problem for me.

My current workaround is to not use long_running_recognize for short recordings, but even that seems to produce different results than before. Before, the API used to produce a one entry response with an empty "alternatives" list whenever the entire audio was silent, where as now, the response we get is simply an empty list without any entries. (My if statements that depended on the previous format are failing). Just wanted to bring it up to your attention.

berkcoker on 14 Jan 2019

beccasaurus on 14 Jan 2019

👍4

I am also facing same problem.

File "/usr/local/lib/python2.7/dist-packages/google/api_core/future/polling.py", line 127, in result
    raise self._exception
google.api_core.exceptions.GoogleAPICallError: None Unexpected state: Long-running operation had neither response nor error set.