使用SpeechSynthesizer使用SpeechAudioFormatInfo进行流式传输

Question

使用SpeechSynthesizer使用SpeechAudioFormatInfo进行流式传输

Ace*_*din 8 .net text-to-speech speech-synthesis speechsynthesizer

我正在使用System.Speech.Synthesis.SpeechSynthesizer将文本转换为语音.由于微软的文档不足(参见我的链接,没有任何评论或代码示例),我无法解决两种方法之间的差异:

SetOutputToAudioStream和SetOutputToWaveStream.

这是我推断的:

SetOutputToAudioStream接受一个流和一个SpeechAudioFormatInfo实例,该实例定义波形文件的格式(每秒采样数,每秒位数,音频通道等)并将文本写入流.

SetOutputToWaveStream只接收一个流,并将16位单声道22kHz PCM波形文件写入流中.没有办法传入SpeechAudioFormatInfo.

我的问题是SetOutputToAudioStream不会将有效的wave文件写入流.例如,在将流传递给System.Media.SoundPlayer时,我得到一个InvalidOperationException("wave header is corrupt").如果我将流写入磁盘并尝试使用WMP播放,我会收到"Windows Media Player无法播放文件..."错误,但SetOutputToWaveStream写入的流在两者中都正常播放.我的理论是SetOutputToAudioStream没有写一个(有效的)头.

奇怪的是,SetOutputTo*Blah*的命名约定是不一致的.SetOutputToWaveFile采用SpeechAudioFormatInfo,而SetOutputToWaveStream则不采用.

我需要能够将8kHz,16位单声道波文件写入流中,这是SetOutputToAudioStream或SetOutputToWaveStream都不允许我这样做的.有没有人深入了解SpeechSynthesizer和这两种方法？

作为参考,这里是一些代码:

Stream ret = new MemoryStream();
using (SpeechSynthesizer synth = new SpeechSynthesizer())
{
  synth.SelectVoice(voiceName);
  synth.SetOutputToWaveStream(ret);
  //synth.SetOutputToAudioStream(ret, new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono));
  synth.Speak(textToSpeak);
}

Run Code Online (Sandbox Code Playgroud)

解:

非常感谢@Hans Passant,这是我现在使用的要点:

Stream ret = new MemoryStream();
using (SpeechSynthesizer synth = new SpeechSynthesizer())
{
  var mi = synth.GetType().GetMethod("SetOutputStream", BindingFlags.Instance | BindingFlags.NonPublic);
  var fmt = new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono);
  mi.Invoke(synth, new object[] { ret, fmt, true, true });
  synth.SelectVoice(voiceName);
  synth.Speak(textToSpeak);
}
return ret;

Run Code Online (Sandbox Code Playgroud)

对于我的粗略测试,它工作得很好,虽然使用反射有点icky它比将文件写入磁盘并打开流更好.

Answer 1

Han*_*ant 8

您的代码片段已被识别,您将在处理后使用synth.但这不是我确定的真正问题.SetOutputToAudioStream生成原始PCM音频,即"数字".没有像.wav文件中使用的容器文件格式(标题).是的,这不能用常规媒体节目播放.

采用SpeechAudioFormatInfo的SetOutputToWaveStream缺少的重载很奇怪.它确实看起来像是对我的疏忽,尽管这在.NET框架中非常罕见.没有令人信服的理由说明它不应该工作,底层的SAPI接口确实支持它.它可以用反射来破解,以调用私有的SetOutputStream方法.我测试它时工作正常,但我无法保证:

using System.Reflection;
...
            using (Stream ret = new MemoryStream())
            using (SpeechSynthesizer synth = new SpeechSynthesizer()) {
                var mi = synth.GetType().GetMethod("SetOutputStream", BindingFlags.Instance | BindingFlags.NonPublic);
                var fmt = new SpeechAudioFormatInfo(8000, AudioBitsPerSample.Eight, AudioChannel.Mono);
                mi.Invoke(synth, new object[] { ret, fmt, true, true });
                synth.Speak("Greetings from stack overflow");
                // Testing code:
                using (var fs = new FileStream(@"c:\temp\test.wav", FileMode.Create, FileAccess.Write, FileShare.None)) {
                    ret.Position = 0;
                    byte[] buffer = new byte[4096];
                    for (;;) {
                        int len = ret.Read(buffer, 0, buffer.Length);
                        if (len == 0) break;
                        fs.Write(buffer, 0, len);
                    }
                }
            }

Run Code Online (Sandbox Code Playgroud)

如果您对hack感到不舒服,那么使用Path.GetTempFileName()将其临时流式传输到文件肯定会有效.

归档时间：	15 年，3 月前
查看次数：	4746 次
最近记录：	15 年，3 月前