Bru*_*uno 14 speech-recognition google-api google-cloud-speech
我正在对Google Cloud Speech API执行异步请求,我不知道如何获取操作结果:
请求POST: https ://speech.googleapis.com/v1beta1/speech: asyncrecognize
身体:
{
"config":{
"languageCode" : "pt-BR",
"encoding" : "LINEAR16",
"sampleRate" : 16000
},
"audio":{
"uri":"gs://bucket/audio.flac"
}
}
Run Code Online (Sandbox Code Playgroud)
哪个回报:
{ "name": "469432517" }
所以,我做了一个POST:https://speech.googleapis.com/v1beta1/operations/469432517
哪个回报:
{
"name": "469432517",
"metadata": {
"@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeMetadata",
"progressPercent": 100,
"startTime": "2016-08-11T21:18:29.985053Z",
"lastUpdateTime": "2016-08-11T21:18:31.888412Z"
},
"done": true,
"response": {
"@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"
}
}
Run Code Online (Sandbox Code Playgroud)
我需要得到操作的结果:转录的文本.
我怎样才能做到这一点?
你得到了操作的结果,它是空的.结果为空的原因是格式不匹配.您应该提交"LINEAR16"文件(PCM未压缩数据,基本上是WAV文件)并尝试提交FLAC(压缩格式).
结果为空的其他原因可能是采样率不正确,通道数不正确等等.
最后,具有纯静音的文件将导致空响应.
我也遇到了这个问题。问题可能出在编码和速率上。以下是我如何找到合适的编码和速率:
audio = types.RecognitionAudio(content = content )
ENCODING = [enums.RecognitionConfig.AudioEncoding.LINEAR16, enums.RecognitionConfig.AudioEncoding.FLAC,enums.RecognitionConfig.AudioEncoding.MULAW,enums.RecognitionConfig.AudioEncoding.AMR,enums.RecognitionConfig.AudioEncoding.AMR_WB,enums.RecognitionConfig.AudioEncoding.OGG_OPUS,enums.RecognitionConfig.AudioEncoding.SPEEX_WITH_HEADER_BYTE]
SAMPLE_RATE_HERTZ = [8000, 12000, 16000, 24000, 48000]
for enco in ENCODING:
for rate in SAMPLE_RATE_HERTZ:
config = types.RecognitionConfig(
encoding=enco,
sample_rate_hertz=rate,
language_code='fa-IR')
# Detects speech in the audio file
response = []
try:
response = CLIENT.recognize(config, audio)
except:
pass
print("-----------------------------------------------------")
print(str(rate) + " " + str(enco))
print("response: ", str(response))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
6233 次 |
| 最近记录: |