将视频字幕与文字转语音同步

Ahm*_*mad 8 c# audio speech-synthesis speechsynthesizer aforge

我尝试创建一个文本视频,其中文本通过文本到语音进行叙述.

要创建的视频文件,我用VideoFileWriterAforge.Net是以下几点:

VideoWriter = new VideoFileWriter();

VideoWriter.Open(CurVideoFile, (int)(Properties.Settings.Default.VideoWidth),
    (int)(Properties.Settings.Default.VideoHeight), 25, VideoCodec.MPEG4, 800000);
Run Code Online (Sandbox Code Playgroud)

要大声朗读文本,我使用SpeechSynthesizer类并将输出写入波流

AudioStream = new FileStream(CurAudioFile, FileMode.Create);
synth.SetOutputToWaveStream(AudioStream);
Run Code Online (Sandbox Code Playgroud)

我想突出显示视频中的单词,所以我通过SpeakProgress事件同步它们:

void synth_SpeakProgress(object sender, SpeakProgressEventArgs e)
{

    curAuidoPosition = e.AudioPosition;
    using (Graphics g = Graphics.FromImage(Screen))
    {
         g.DrawString(e.Text,....); 
    }                    
    VideoWriter.WriteVideoFrame(Screen, curAuidoPosition);
}
Run Code Online (Sandbox Code Playgroud)

最后,我使用合并视频和音频 ffmpeg

using (Process process = new Process())
{
        process.StartInfo.FileName = exe_path;
        process.StartInfo.Arguments = 
            string.Format(@"-i ""{0}"" -i ""{1}"" -y -acodec copy -vcodec copy ""{2}""", avi_path, mp3_path, output_file);

        // ...
}
Run Code Online (Sandbox Code Playgroud)

问题是,对于像微软Hazel,Zira和David这样的声音,在Windows 8.1中,视频与音频不同步,音频比显示的字幕快得多.但是,对于Windows 7中的声音,它可以工作.

如何同步它们以使其适用于任何操作系统上的任何文本到语音的声音?

它似乎e.AudioPosition是不准确的,因为它在SpeechSynthesizer的SpeakProgressEventArgs中是不准确的?,我有相同的实验和相同的结果.

我注意到如果我调整音频格式,我可以接近实际时间,但它不适用于任何语音.

var formats = CurVoice.VoiceInfo.SupportedAudioFormats;
if (formats.Count > 0)
{
    var format = formats[0];
    reader.SetOutputToWaveFile(CurAudioFile, format);
}
else
{
     AudioStream = new FileStream(CurAudioFile, FileMode.Create);
     reader.SelectVoice(CurVoice.VoiceInfo.Name);
    var fmt = new SpeechAudioFormatInfo(16000, AudioBitsPerSample.Sixteen, AudioChannel.Mono);
    // this is more close but not precise yet
    MemStream = new MemoryStream();
    var mi = reader.GetType().GetMethod("SetOutputStream", BindingFlags.Instance | BindingFlags.NonPublic);
    mi.Invoke(reader, new object[] { MemStream, fmt, true, true }); 
 }
Run Code Online (Sandbox Code Playgroud)