Google语音识别API结果为空

Bru*_*uno 14 speech-recognition google-api google-cloud-speech

我正在对Google Cloud Speech API执行异步请求,我不知道如何获取操作结果:

请求POST: https ://speech.googleapis.com/v1beta1/speech: asyncrecognize

身体:

{
    "config":{
                 "languageCode" : "pt-BR",
                 "encoding" : "LINEAR16",
                 "sampleRate" : 16000
             },
     "audio":{
                 "uri":"gs://bucket/audio.flac"
             }
}
Run Code Online (Sandbox Code Playgroud)

哪个回报:

{ "name": "469432517" }

所以,我做了一个POST:https://speech.googleapis.com/v1beta1/operations/469432517

哪个回报:

{
    "name": "469432517",
    "metadata": {
        "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeMetadata",
        "progressPercent": 100,
        "startTime": "2016-08-11T21:18:29.985053Z",
        "lastUpdateTime": "2016-08-11T21:18:31.888412Z"
    },
    "done": true,
    "response": {
                    "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"
                }
}
Run Code Online (Sandbox Code Playgroud)

我需要得到操作的结果:转录的文本.

我怎样才能做到这一点?

Nik*_*rev 7

你得到了操作的结果,它是空的.结果为空的原因是格式不匹配.您应该提交"LINEAR16"文件(PCM未压缩数据,基本上是WAV文件)并尝试提交FLAC(压缩格式).

结果为空的其他原因可能是采样率不正确,通道数不正确等等.

最后,具有纯静音的文件将导致空响应.


Arm*_*man 6

我也遇到了这个问题。问题可能出在编码和速率上。以下是我如何找到合适的编码和速率:

audio = types.RecognitionAudio(content = content )

ENCODING = [enums.RecognitionConfig.AudioEncoding.LINEAR16, enums.RecognitionConfig.AudioEncoding.FLAC,enums.RecognitionConfig.AudioEncoding.MULAW,enums.RecognitionConfig.AudioEncoding.AMR,enums.RecognitionConfig.AudioEncoding.AMR_WB,enums.RecognitionConfig.AudioEncoding.OGG_OPUS,enums.RecognitionConfig.AudioEncoding.SPEEX_WITH_HEADER_BYTE]

SAMPLE_RATE_HERTZ = [8000, 12000, 16000, 24000, 48000]
for enco in ENCODING:
    for rate in SAMPLE_RATE_HERTZ:
        config = types.RecognitionConfig(
            encoding=enco,
            sample_rate_hertz=rate,
            language_code='fa-IR')

        # Detects speech in the audio file
        response = []
        try:
            response = CLIENT.recognize(config, audio)
        except:
            pass
        print("-----------------------------------------------------")
        print(str(rate) + "   " + str(enco))
        print("response: ", str(response))
Run Code Online (Sandbox Code Playgroud)