使用.Net 3.5中的System.Speech.Synthesis.SpeechSynthesizer类,SpeakProgressEventArgs的AudioPosition属性似乎不准确.
以下代码生成以下输出:
码:
using System;
using System.Speech.Synthesis;
using System.Threading;
namespace SpeechTest
{
class Program
{
static ManualResetEvent speechDoneEvent = new ManualResetEvent(false);
static void Main(string[] args)
{
SpeechSynthesizer synthesizer = new SpeechSynthesizer();
synthesizer.SpeakProgress += new EventHandler<SpeakProgressEventArgs>(synthesizer_SpeakProgress);
synthesizer.SpeakCompleted += new EventHandler<SpeakCompletedEventArgs>(synthesizer_SpeakCompleted);
synthesizer.SetOutputToWaveFile("Test.wav");
synthesizer.SpeakAsync("This holiday season, support the music you love by shopping at Made in Washington, online and at one of five local stores. Made in Washington chocolates, bountiful gift baskets and ornaments are the perfect holiday gifts for family, friends …
Run Code Online (Sandbox Code Playgroud) 我目前正在开发一种应用程序,它需要传输编码为特定音频格式的语音.
System.Speech.AudioFormat.SpeechAudioFormatInfo synthFormat =
new System.Speech.AudioFormat.SpeechAudioFormatInfo(System.Speech.AudioFormat.EncodingFormat.Pcm,
8000, 16, 1, 16000, 2, null);
Run Code Online (Sandbox Code Playgroud)
这表明音频采用PCM格式,每秒8000个采样,每个采样16位,单声道,每秒16000个平均字节,块对齐为2.
当我尝试执行以下代码时,没有任何内容写入我的MemoryStream实例; 但是,当我从每秒8000个样本更改为11025时,音频数据被成功写入.
SpeechSynthesizer synthesizer = new SpeechSynthesizer();
waveStream = new MemoryStream();
PromptBuilder pbuilder = new PromptBuilder();
PromptStyle pStyle = new PromptStyle();
pStyle.Emphasis = PromptEmphasis.None;
pStyle.Rate = PromptRate.Fast;
pStyle.Volume = PromptVolume.ExtraLoud;
pbuilder.StartStyle(pStyle);
pbuilder.StartParagraph();
pbuilder.StartVoice(VoiceGender.Male, VoiceAge.Teen, 2);
pbuilder.StartSentence();
pbuilder.AppendText("This is some text.");
pbuilder.EndSentence();
pbuilder.EndVoice();
pbuilder.EndParagraph();
pbuilder.EndStyle();
synthesizer.SetOutputToAudioStream(waveStream, synthFormat);
synthesizer.Speak(pbuilder);
synthesizer.SetOutputToNull();
Run Code Online (Sandbox Code Playgroud)
使用8000的采样率时没有记录异常或错误,我在关于SetOutputToAudioStream的文档中找不到任何有用的东西,为什么它以每秒11025个样本而不是8000成功.我有一个涉及wav文件的解决方法,我使用一些声音编辑工具生成并转换为正确的采样率,但如果可以的话,我想从应用程序中生成音频.
一个特别的兴趣点是SpeechRecognitionEngine接受该音频格式并成功识别出我的合成波形文件中的语音...
更新:最近发现这种音频格式对某些已安装的声音成功,但对其他声音失败.它专门针对LH Michael和LH Michelle而失败,并且针对PromptBuilder中定义的某些语音设置的失败会有所不同.