Google-cloud-python: Speech: Include the 'diarization_config' parameter in the RecognitionConfig object.

Created on 26 Sep 2019 · 12Comments · Source: googleapis/google-cloud-python

Hello,

I am trying to include the SpeakerDiarizationConfig to the RecognitionConfig via the 'diarization_config' parameter but I am not being able to do that and I don't see any example on the documentation page in order to make it work.

My approach looks as follow:

diarization_config = { "enableSpeakerDiarization": True, "minSpeakerCount": 2, "maxSpeakerCount": 3}

config = types.RecognitionConfig( encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16, sample_rate_hertz=frame_rate, language_code="es-ES", enable_word_time_offsets=True, diarization_config=diarization_config, enable_automatic_punctuation=True)

As far as I understand, the 'diarization_config' is supposed to be a SpeakerDiarizationConfig object but I don't get how to use it properly.

My actual result is: "ValueError: Protocol message RecognitionConfig has no "diarization_config" field." In contrast, my expected result is a transcript that includes the 'speakerTag' in the word list, like:

{ "startTime": "127.500s", "endTime": "127.700s", "word": "la", "speakerTag": 2 }, { "startTime": "127.700s", "endTime": "129.300s", "word": "direcci\u00f3n.", "speakerTag": 2 }

Thanks in advance for your kind help.

question speech

Source

ibalejandro

Most helpful comment

I've started the process to release google-cloud-speech 1.3.0 with that feature.

tseaver on 26 Sep 2019

👍3

All 12 comments

You should be able to construct a SpeakerDiarizationConfig message directly, e.g.:

from google.cloud.speech_v1 import types, enums

d_config = types.SpeakerDiarizationConfig(
    enable_speaker_diarization=True,
    min_speaker_count=2,
    max_speaker_count=3
)
r_config = types.RecognitionConfig(
    encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=44100,
    language_code="en-US",
    enable_word_time_offsets=True,
    enable_automatic_punctuation=True,
    diarization_config=d_config,
)

tseaver on 26 Sep 2019

Thank you for the answer @tseaver,

I just tried your suggestion and I got the following error.

diarization_config = types.SpeakerDiarizationConfig( AttributeError: module 'google.cloud.speech_v1.types' has no attribute 'SpeakerDiarizationConfig'
Any idea of what could be happening?

Thanks in advance.

ibalejandro on 26 Sep 2019

@ibalejandro Are you running from a released version of google-cloud-speech, or from the git master branch? That class is not present in any released version.

tseaver on 26 Sep 2019

@busunkim96 Can we just go ahead and make a 1.3.0 release for google-cloud-speech?

tseaver on 26 Sep 2019

That might be the issue. I am running it from a released version of google-cloud-speech.

ibalejandro on 26 Sep 2019

I've started the process to release google-cloud-speech 1.3.0 with that feature.

tseaver on 26 Sep 2019

👍3

Thank you @tseaver, I will be very attentive.

ibalejandro on 26 Sep 2019

@tseaver I think it is fine, waiting for a final thumbs up from someone on AniML. It looks like this has already been launched on the backend, and I see no borked regeneration PRs.

busunkim96 on 26 Sep 2019

Ok. I understand. Thank you for the information!

ibalejandro on 27 Sep 2019

ValueError Traceback (most recent call last)
in ()
----> 1 client.long_running_recognize(config, audio)

~/anaconda3/lib/python3.6/site-packages/google/cloud/speech_v1/gapic/speech_client.py in long_running_recognize(self, config, audio, retry, timeout, metadata)
336
337 request = cloud_speech_pb2.LongRunningRecognizeRequest(
--> 338 config=config, audio=audio
339 )
340 operation = self._inner_api_calls["long_running_recognize"](

ValueError: Protocol message RecognitionConfig has no "diarization_config" field.

kamrankausar on 11 Oct 2019

@kamrankausar The speaker diarization feature is not available in the most recent release of the library. Are you adding it to the config?

@tswast Do you know what the timeline is for the speaker diarization feature?

busunkim96 on 11 Oct 2019

@kamrankausar Speaker diarization is only a beta feature now so you need to use the beta library.

client = speech_v1p1beta1.SpeechClient()

See the code sample at https://cloud.google.com/speech-to-text/docs/multiple-voices

tswast on 11 Oct 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Core: add retry on 503 errors.

bmenasha · 3Comments

Bigquery: add ability to restrict Client class to particular dataset

blainehansen · 3Comments

Upload file to GCS with Storage Object Creator role.

tweeter0830 · 4Comments

[QUESTION][BigQuery] would it be possible to have extract_table_to_storage support with an optional query

vrcs · 3Comments

Tasks: ImportError: cannot import name 'expr_pb2' from 'google.type'

blaflamme · 3Comments