Bas*_*asj 17 audio speech-recognition speech speech-to-text google-speech-api
可以使用Google的语音识别API通过执行请求来获取音频文件(WAV,MP3等)的转录 http://www.google.com/speech-api/v2/recognize?...
示例:我在WAV文件中说过" 一二三五 ".谷歌API给了我这个:
{
u'alternative':
[
{u'transcript': u'12345'},
{u'transcript': u'1 2 3 4 5'},
{u'transcript': u'one two three four five'}
],
u'final': True
}
Run Code Online (Sandbox Code Playgroud)
问题:是否可以获得每个单词的时间(以秒为单位)?
用我的例子:
['one', 0.23, 0.80], ['two', 1.03, 1.45], ['three', 1.79, 2.35], etc.
Run Code Online (Sandbox Code Playgroud)
即,
在时间00:00:00.23和00:00:00.80之间已经说过"一个"字样,在时间00:00:01.03和00:00:01.45(以秒为单位)之间说出了"两个"字样.
PS:寻找支持除英语之外的其他语言的API,尤其是法语.
dew*_*ydb 10
我相信另一个答案现在已经过时了.现在可以使用Google Cloud Search API:https: //cloud.google.com/speech/docs/async-time-offsets
谷歌API无法实现.
如果需要单词时间戳,可以使用其他API,例如:
CMUSphinx - 免费的离线语音识别API
小智 7
是的,这是非常有可能的。您需要做的就是:
在配置集中 enable_word_time_offsets=True
config = types.RecognitionConfig(
....
enable_word_time_offsets=True)Run Code Online (Sandbox Code Playgroud)
然后,对于替代中的每个单词,您可以打印其开始时间和结束时间,如以下代码所示:
for result in result.results:
alternative = result.alternatives[0]
print(u'Transcript: {}'.format(alternative.transcript))
print('Confidence: {}'.format(alternative.confidence))
for word_info in alternative.words:
word = word_info.word
start_time = word_info.start_time
end_time = word_info.end_time
print('Word: {}, start_time: {}, end_time: {}'.format(
word,
start_time.seconds + start_time.nanos * 1e-9,
end_time.seconds + end_time.nanos * 1e-9))Run Code Online (Sandbox Code Playgroud)
这将为您提供以下格式的输出:
Transcript: Do you want me to give you a call back?
Confidence: 0.949534416199
Word: Do, start_time: 1466.0, end_time: 1466.6
Word: you, start_time: 1466.6, end_time: 1466.7
Word: want, start_time: 1466.7, end_time: 1466.8
Word: me, start_time: 1466.8, end_time: 1466.9
Word: to, start_time: 1466.9, end_time: 1467.1
Word: give, start_time: 1467.1, end_time: 1467.2
Word: you, start_time: 1467.2, end_time: 1467.3
Word: a, start_time: 1467.3, end_time: 1467.4
Word: call, start_time: 1467.4, end_time: 1467.6
Word: back?, start_time: 1467.6, end_time: 1467.7Run Code Online (Sandbox Code Playgroud)
来源:https : //cloud.google.com/speech-to-text/docs/async-time-offsets
| 归档时间: |
|
| 查看次数: |
6040 次 |
| 最近记录: |