如何在 Python 中使用 recognize_sphinx API 提高语音到文本转换的准确性

Question

如何在 Python 中使用 recognize_sphinx API 提高语音到文本转换的准确性

vin*_*747 5 python speech-recognition speech-to-text cmusphinx google-speech-api

如何使用recognize_sphinxPython API提高语音到文本转换的准确性？

请找到下面的代码，需要提高准确率基础！

import speech_recognition as sr

# Obtain path to "english.wav" in the same folder as this script
from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath(file)), "english.wav")
AUDIO_FILE = path.join(path.dirname(path.realpath(file)), "french.aiff")
AUDIO_FILE = path.join(path.dirname(path.realpath(file)), "chinese.flac")

# Use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.record(source) # Read the entire audio file
# Recognize speech using Sphinx
try:
    print("Sphinx thinks you said " + r.recognize_sphinx(audio))
except sr.UnknownValueError:
    print("Sphinx could not understand audio")
except sr.RequestError as e:
    print("Sphinx error; {0}".format(e))

Run Code Online (Sandbox Code Playgroud)

Answer 1

Sci*_*cia 5

因此，如果我理解正确，那么您将无法根据用户或音频文件所说的内容获得正确的输出。例如，音频/用户会说“嗨，那里！” 输出可能是“完全不同的东西”。

查看您的代码，我注意到您正在使用三种类型的不同音频文件。每个文件都使用不同的语言。当您打开 SpeechRecognition 的文档时，您会看到有一个库参考。在此库参考中，将有有关使用 PocketSphinx 的注释。首先脱颖而出的是：

默认情况下，SpeechRecognition 的 Sphinx 功能仅支持美国英语。还提供其他语言包，但由于文件太大而未包含在内

我猜您已经安装了所有需要的软件包。我不会解释这部分，因为它是不言自明的。无论如何，文档还解释说您可以：

安装后，您可以使用 recognizer_instance.recognize_sphinx 的语言参数简单地指定语言。例如，法语将指定为“fr-FR”，普通话将指定为“zh-CN”。

我不确定上面的代码是否是您的，或者您只是从某处复制并粘贴它。无论如何，您的代码存在一些问题。您不断用另一个文件覆盖 AUDIO_FILE 变量。因此，您不是“在与此脚本相同的文件夹中获取“english.wav”的路径”，而是获取“chinese.flac”的路径。

现在，我想您已经知道“语音到文本的准确性”可能存在什么问题。它正在“听”中文并尝试将其输出为英文单词。这是非常不言自明的......

要解决此问题，只需添加语言参数并将其设置为您希望指定的语言。例如，

import speech_recognition as sr

# Obtain path to "chinese.flac" in the same folder as this script
from os import path

# AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav")
# AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "french.aiff")
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "chinese.flac")

# Use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
    audio = r.record(source)  # Read the entire audio file

# Recognize speech using Sphinx
try:
    # Just pass a language parameter
    print("Sphinx thinks you said " + r.recognize_sphinx(audio, language="zh-CN"))
except sr.UnknownValueError:
    print("Sphinx could not understand audio")
except sr.RequestError as e:
    print("Sphinx error; {0}".format(e))

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，10 月前
查看次数：	7755 次
最近记录：	3 年，2 月前