为什么我的python脚本无法识别音频文件中的语音？

Question

为什么我的python脚本无法识别音频文件中的语音？

我有以下代码可以成功识别短时间（少于1分钟）的测试音频文件，但无法识别另一个长音频文件（1.5h）。

from google.cloud import speech


def run_quickstart():
    speech_client = speech.Client()
    sample = speech_client.sample(source_uri="gs://linear-arena-2109/zoom0070.flac", encoding=speech.Encoding.FLAC)
    alternatives = sample.recognize('uk-UA')
    for alternative in alternatives:
        print(u'Transcript: {}'.format(alternative.transcript))

    with open("Output.txt", "w") as text_file:
        for alternative in alternatives:
            text_file.write(alternative.transcript.encode('utf8'))

if __name__ == '__main__':
    run_quickstart()

Run Code Online (Sandbox Code Playgroud)

这两个文件都上传到Google Cloud。

第一个：https : //storage.googleapis.com/linear-arena-2109/sample.flac

第二个：https： //storage.googleapis.com/linear-arena-2109/zoom0070.flac

两者都是使用ffmpeg实用程序从mp3转换而成的：

ffmpeg -i sample.mp3 -ac 1 sample.flac
ffmpeg -i zoom0070.mp3 -ac 1 zoom0070.flac

Run Code Online (Sandbox Code Playgroud)

成功识别第一个文件，但是第二个文件输出以下错误：

google.gax.errors.RetryError: GaxError(Exception occurred in retry method that was not classified as transient, caused by <_Rendezvous of RPC that terminated with (StatusCode.INVALID_ARGUMENT, Sync input too long. For audio longer than 1 min use LongRunningRecognize with a 'uri' parameter.)>)

Run Code Online (Sandbox Code Playgroud)

但是我已经uri在我的python脚本中使用了参数。怎么了？

更新

@NieDzejkob帮助理解了该错误。因此，long_running_recognize应使用方法代替recognize。全面的long_running_recognize用法示例可在相应的文档页面上找到

Answer 1

pbe*_*gle 5

对于任何超过1分钟的音频文件，您需要使用异步语音识别，并且必须将文件上传到Google Cloud Storage，以便您可以传递gcs_uri。

另外，您将需要.long_running_recognize在脚本中使用该方法。您可以在此处找到GCP文档的示例。

我意识到OP已经解决了这个问题，但认为提供答案并将其概括起来会很有用。

归档时间：	8 年，8 月前
查看次数：	3007 次
最近记录：	8 年，7 月前