Hello,
I am trying to include the SpeakerDiarizationConfig to the RecognitionConfig via the 'diarization_config' parameter but I am not being able to do that and I don't see any example on the documentation page in order to make it work.
My approach looks as follow:
diarization_config = { "enableSpeakerDiarization": True, "minSpeakerCount": 2, "maxSpeakerCount": 3}
config = types.RecognitionConfig( encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16, sample_rate_hertz=frame_rate, language_code="es-ES", enable_word_time_offsets=True, diarization_config=diarization_config, enable_automatic_punctuation=True)
As far as I understand, the 'diarization_config' is supposed to be a SpeakerDiarizationConfig object but I don't get how to use it properly.
My actual result is: "ValueError: Protocol message RecognitionConfig has no "diarization_config" field." In contrast, my expected result is a transcript that includes the 'speakerTag' in the word list, like:
{ "startTime": "127.500s", "endTime": "127.700s", "word": "la", "speakerTag": 2 }, { "startTime": "127.700s", "endTime": "129.300s", "word": "direcci\u00f3n.", "speakerTag": 2 }
Thanks in advance for your kind help.
You should be able to construct a SpeakerDiarizationConfig message directly, e.g.:
from google.cloud.speech_v1 import types, enums
d_config = types.SpeakerDiarizationConfig(
enable_speaker_diarization=True,
min_speaker_count=2,
max_speaker_count=3
)
r_config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=44100,
language_code="en-US",
enable_word_time_offsets=True,
enable_automatic_punctuation=True,
diarization_config=d_config,
)
Thank you for the answer @tseaver,
I just tried your suggestion and I got the following error.
diarization_config = types.SpeakerDiarizationConfig(
AttributeError: module 'google.cloud.speech_v1.types' has no attribute 'SpeakerDiarizationConfig'
Any idea of what could be happening?
Thanks in advance.
@ibalejandro Are you running from a released version of google-cloud-speech, or from the git master branch? That class is not present in any released version.
@busunkim96 Can we just go ahead and make a 1.3.0 release for google-cloud-speech?
That might be the issue. I am running it from a released version of google-cloud-speech.
Thank you @tseaver, I will be very attentive.
@tseaver I think it is fine, waiting for a final thumbs up from someone on AniML. It looks like this has already been launched on the backend, and I see no borked regeneration PRs.
Ok. I understand. Thank you for the information!
ValueError Traceback (most recent call last)
----> 1 client.long_running_recognize(config, audio)
~/anaconda3/lib/python3.6/site-packages/google/cloud/speech_v1/gapic/speech_client.py in long_running_recognize(self, config, audio, retry, timeout, metadata)
336
337 request = cloud_speech_pb2.LongRunningRecognizeRequest(
--> 338 config=config, audio=audio
339 )
340 operation = self._inner_api_calls["long_running_recognize"](
ValueError: Protocol message RecognitionConfig has no "diarization_config" field.
@kamrankausar The speaker diarization feature is not available in the most recent release of the library. Are you adding it to the config?
@tswast Do you know what the timeline is for the speaker diarization feature?
@kamrankausar Speaker diarization is only a beta feature now so you need to use the beta library.
client = speech_v1p1beta1.SpeechClient()
See the code sample at https://cloud.google.com/speech-to-text/docs/multiple-voices
Most helpful comment
I've started the process to release
google-cloud-speech 1.3.0with that feature.