如何从语音识别中提取变量

Ran*_*ger 5 c# speech-recognition system.speech.recognition

System.Speech用来识别一些短语或单词.其中之一是Set timer.我想将其扩展为Set timer for X seconds,并让代码设置X秒的计时器.这可能吗?到目前为止,我对此几乎没有任何经验,我所能找到的是我必须对语法课做一些事情.

现在我已经设置了这样的识别引擎:

SpeechRecognitionEngine = new SpeechRecognitionEngine();
SpeechRecognitionEngine.SetInputToDefaultAudioDevice();

var choices = new Choices();
choices.Add("Set timer");

var gb = new GrammarBuilder();
gb.Append(choices);
var g = new Grammar(gb);

SpeechRecognitionEngine.LoadGrammarAsync(g);

SpeechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
SpeechRecognitionEngine.SpeechRecognized += OnSpeechRecognized;
Run Code Online (Sandbox Code Playgroud)

有没有办法做到这一点?

Evk*_*Evk 6

首先,没有内置的数字概念.语音只是单词序列,如果你需要识别数字 - 你需要识别表示数字的单词,例如"一"和"十五".有些数字由多个单词表示,例如"一百"或"五十一" - 您也需要识别它们.

您可以从识别1到9的数字开始:

var engine = new SpeechRecognitionEngine(CultureInfo.GetCultureInfo("en-US"));
engine.SetInputToDefaultAudioDevice();
var num1To9 = new Choices(
    new SemanticResultValue("one", 1),
    new SemanticResultValue("two", 2),
    new SemanticResultValue("three", 3),
    new SemanticResultValue("four", 4),
    new SemanticResultValue("five", 5),
    new SemanticResultValue("six", 6),
    new SemanticResultValue("seven", 7),
    new SemanticResultValue("eight", 8),
    new SemanticResultValue("nine", 9));

var gb = new GrammarBuilder();
gb.Culture = CultureInfo.GetCultureInfo("en-US");
gb.Append("set timer for");
gb.Append(num1To9);
gb.Append("seconds");
var g = new Grammar(gb);

engine.LoadGrammar(g); // better not use LoadGrammarAsync
engine.SpeechRecognized += OnSpeechRecognized;
engine.RecognizeAsync(RecognizeMode.Multiple);
Console.WriteLine("Speak");
Console.ReadKey();
Run Code Online (Sandbox Code Playgroud)

所以我们的语法可以理解为:

  • "为"短语设置计时器
  • 其次是"一个"或"两个"或"三个"......
  • 接着是"秒"

我们用来SemanticResultValue为特定短语分配标签.在这种情况下,标签是对应于特定单词("一","两","三")的数字(1,2,3 ......).通过这样做 - 您可以从识别结果中提取该值:

private static void OnSpeechRecognized(object sender, SpeechRecognizedEventArgs e) {
    var numSeconds = (int)e.Result.Semantics.Value;
    Console.WriteLine($"Starting timer for {numSeconds} seconds...");
}
Run Code Online (Sandbox Code Playgroud)

这已经是一个工作示例,可以识别您的短语,如"设置五秒计时器",并允许您从中提取语义值(5).

现在您可以将各种数字组合在一起,例如:

var num10To19 = new Choices(
    new SemanticResultValue("ten", 10),
    new SemanticResultValue("eleven", 11),
    new SemanticResultValue("twelve", 12),
    new SemanticResultValue("thirteen", 13),
    new SemanticResultValue("fourteen", 14),
    new SemanticResultValue("fifteen", 15),
    new SemanticResultValue("sexteen", 16),
    new SemanticResultValue("seventeen", 17),
    new SemanticResultValue("eighteen", 18),
    new SemanticResultValue("nineteen", 19)
);

var numTensFrom20To90 = new Choices(
    new SemanticResultValue("twenty", 20),
    new SemanticResultValue("thirty", 30),
    new SemanticResultValue("forty", 40),
    new SemanticResultValue("fifty", 50),
    new SemanticResultValue("sixty", 60),
    new SemanticResultValue("seventy", 70),
    new SemanticResultValue("eighty", 80),
    new SemanticResultValue("ninety", 90)
);

var num20to99 = new GrammarBuilder();
// first word is "twenty", "thirty" etc
num20to99.Append(numTensFrom20To90);
// followed by ONE OR ZERO "digit" words ("one", "two", "three" etc)
num20to99.Append(num1To9, 0, 1);
Run Code Online (Sandbox Code Playgroud)

但是正确地为它们分配语义值变得棘手,因为这个api并GrammarBuilder不足以做到这一点.

如果您想要做的事情不能(轻松地)使用纯GrammarBuilder类和相关类完成 - 您必须使用更强大的xml文件,并在规范中定义语法.

这些语法文件的描述超出了这个问题的范围,但幸运的是,对于您的任务,Microsoft Speech SDK中已经提供了语法文件,您可能已经下载并安装了该语法文件.因此,从"C:\ Program Files\Microsoft SDKs\Speech\v11.0\Samples\Sample Grammars\en-US.grxml"(或安装SDK的任何地方)复制文件并删除一些不相关的内容,例如第一个<tag>元素大CDATA里面.

此文件中的兴趣归属名称为"Cardinal",允许识别0到100万的数字.然后我们的代码变成:

var sampleDoc = new SrgsDocument(@"en-US-sample.grxml");
sampleDoc.Culture = CultureInfo.GetCultureInfo("en-US");
// define new rule, named Timer
SrgsRule rootRule = new SrgsRule("Timer");            
// match "set timer for" phrase
rootRule.Add(new SrgsItem("set timer for"));
// followed by whatever "Cardinal" rule defines (reference to another rule)
rootRule.Add(new SrgsRuleRef(sampleDoc.Rules["Cardinal"]));
// followed by "seconds"
rootRule.Add(new SrgsItem("seconds"));
// add to rules
sampleDoc.Rules.Add(rootRule);
// make it a root rule, so that it will be used for recognition
sampleDoc.Root = rootRule;
var g = new Grammar(sampleDoc);

engine.LoadGrammar(g); // better not use LoadGrammarAsync
engine.SpeechRecognized += OnSpeechRecognized;
engine.RecognizeAsync(RecognizeMode.Multiple);
Run Code Online (Sandbox Code Playgroud)

处理程序变为:

private static void OnSpeechRecognized(object sender, SpeechRecognizedEventArgs e) {
    var numSeconds = Convert.ToInt32(e.Result.Semantics.Value);
    Console.WriteLine($"Starting timer for {numSeconds} seconds...");
}
Run Code Online (Sandbox Code Playgroud)

现在,您可以识别最多100万的数字.

当然,没有必要像上面那样在代码中定义规则 - 您可以在xml中完全定义所有规则,然后将其加载为SrgsDocumentGrammar从中创建.

如果要识别多个命令 - 这是一个示例:

var sampleDoc = new SrgsDocument(@"en-US-sample.grxml");            
sampleDoc.Culture = CultureInfo.GetCultureInfo("en-US");
// this rule is the same as above
var setTimerRule = new SrgsRule("SetTimer");            
setTimerRule.Add(new SrgsItem("set timer for"));            
setTimerRule.Add(new SrgsRuleRef(sampleDoc.Rules["Cardinal"]));            
setTimerRule.Add(new SrgsItem("seconds"));            
sampleDoc.Rules.Add(setTimerRule);

// new rule, clear timer
var clearTimerRule = new SrgsRule("ClearTimer");
// just match this phrase
clearTimerRule.Add(new SrgsItem("clear timer"));
sampleDoc.Rules.Add(clearTimerRule);
// new root rule, marching either set timer OR clear timer
var rootRule = new SrgsRule("Times");
rootRule.Add(new SrgsOneOf( // << OneOf is basically the same as Choice
    //               reference to SetTimer                                         
    new SrgsItem(new SrgsRuleRef(setTimerRule), 
        // assign command name. Both "command" and "settimer" are arbitrary names I chose
        new SrgsSemanticInterpretationTag("out = rules.latest();out.command = 'settimer';")),
    new SrgsItem(new SrgsRuleRef(clearTimerRule),
        // assign command name. If this rule "wins" - command will be cleartimer
        new SrgsSemanticInterpretationTag("out.command = 'cleartimer';"))
));
sampleDoc.Rules.Add(rootRule);
sampleDoc.Root = rootRule;
var g = new Grammar(sampleDoc);
Run Code Online (Sandbox Code Playgroud)

处理程序变为:

private static void OnSpeechRecognized(object sender, SpeechRecognizedEventArgs e) {
    var sem = e.Result.Semantics;
    // here "command" is arbitrary key we assigned in our rule
    var commandName = (string) sem["command"].Value;
    switch (commandName) {
        // also arbitrary values we assigned, not related to rule names or something else
        case "settimer":
            var numSeconds = Convert.ToInt32(sem.Value);
            Console.WriteLine($"Starting timer for {numSeconds} seconds...");
            break;
        case "cleartimer":
            Console.WriteLine("timer cleared");
            break;
    }
}
Run Code Online (Sandbox Code Playgroud)

对于完整 - 这是你如何使用纯xml做同样的事情.使用xml编辑器打开"en-US-sample.grxml"文件,并在代码中添加我们在上面定义的规则.它们看起来像这样:

<rule id="SetTimer" scope="private">
    <item>set timer for</item>
    <item>
        <ruleref uri="#Cardinal" />
    </item>
    <item>seconds</item>
</rule>

<rule id="ClearTimer" scope="private">
    <item>clear timer</item>
</rule>

<rule id="Timers" scope="public">
    <one-of>
        <item>
            <ruleref uri="#SetTimer" />
            <tag>out = rules.latest(); out.command = 'settimer'</tag>
        </item>
        <item>
            <ruleref uri="#ClearTimer" />
            <tag>out.command = 'cleartimer'</tag>
        </item>
    </one-of>
</rule> 
Run Code Online (Sandbox Code Playgroud)

现在在根语法标记处设置根规则:

<grammar xml:lang="en-US" version="1.0" xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0" 
    root="Timers">
Run Code Online (Sandbox Code Playgroud)

并保存.

现在我们不需要在代码中定义任何东西,我们需要做的就是加载我们的语法文件:

var sampleDoc = new SrgsDocument(@"en-US-sample.grxml");                        
var g = new Grammar(sampleDoc);
engine.LoadGrammar(g);
Run Code Online (Sandbox Code Playgroud)

就这样.因为"定时器"规则是语法文件中的根规则 - 它将用于识别,并且行为与我们在代码中定义的版本完全相同.