C# - 捕获RTP流并发送到语音识别

dgr*_*eck 5 speech-recognition ffmpeg vlc stream rtp

我想要完成的事情:

  • 在C#中捕获RTP流
  • 将该流转发到System.Speech.SpeechRecognitionEngine

我正在创建一个基于Linux的机器人,它将接收麦克风输入,发送Windows机器,它将使用Microsoft语音识别处理音频并将响应发送回机器人.机器人可能距离服务器数百英里,所以我想在互联网上这样做.

到目前为止我做了什么:

  • 让机器人使用FFmpeg生成以MP3格式(其他可用格式)编码的RTP流(机器人在运行Arch Linux的Raspberry Pi上运行)
  • 使用VLC ActiveX控件在客户端计算机上捕获流
  • 发现SpeechRecognitionEngine具有可用的方法:
    1. recognizer.SetInputToWaveStream()
    2. recognizer.SetInputToAudioStream()
    3. recognizer.SetInputToDefaultAudioDevice()
  • 看着使用JACK将应用程序的输出发送到线路输入,但完全被它搞糊涂了.

我需要帮助的是:

我坚持如何实际将流从VLC发送到SpeechRecognitionEngine.VLC根本不公开流.有没有办法可以捕获流并将该流对象传递给SpeechRecognitionEngine?或者RTP不是解决方案吗?

在此先感谢您的帮助.

dgr*_*eck 5

经过大量工作,我终于Microsoft.SpeechRecognitionEngine接受了WAVE音频流.这是过程:

在Pi上,我有ffmpeg在运行.我使用此命令流式传输音频

ffmpeg -ac 1 -f alsa -i hw:1,0 -ar 16000 -acodec pcm_s16le -f rtp rtp://XXX.XXX.XXX.XXX:1234
Run Code Online (Sandbox Code Playgroud)

在服务器端,我创建一个UDPClient并侦听端口1234.我在一个单独的线程上接收数据包.首先,我剥离RTP头(这里解释的头格式)并将有效负载写入特殊流.我不得不使用Sean的响应描述SpeechStreamer类来使SpeechRecognitionEngine工作.它不符合标准.Memory Stream

我在语音识别方面唯一要做的就是将输入设置为音频流而不是默认的音频设备.

recognizer.SetInputToAudioStream( rtpClient.AudioStream,
    new SpeechAudioFormatInfo(WAVFile.SAMPLE_RATE, AudioBitsPerSample.Sixteen, AudioChannel.Mono));
Run Code Online (Sandbox Code Playgroud)

我没有对它进行过大量的测试(即让它流几天并看它是否仍然有效),但是我能够保存音频样本SpeechRecognized并且听起来很棒.我使用的采样率为16 KHz.我可能会将其降低到8 KHz以减少数据传输量,但是一旦它成为问题我就会担心.

我还要提一下,响应非常快.我可以说一整句话并在不到一秒的时间内得到答复.RTP连接似乎为进程增加了很少的开销.我将不得不尝试一个基准测试并将其与仅使用MIC输入进行比较.

编辑:这是我的RTPClient类.

    /// <summary>
    /// Connects to an RTP stream and listens for data
    /// </summary>
    public class RTPClient
    {
        private const int AUDIO_BUFFER_SIZE = 65536;

        private UdpClient client;
        private IPEndPoint endPoint;
        private SpeechStreamer audioStream;
        private bool writeHeaderToConsole = false;
        private bool listening = false;
        private int port;
        private Thread listenerThread; 

        /// <summary>
        /// Returns a reference to the audio stream
        /// </summary>
        public SpeechStreamer AudioStream
        {
            get { return audioStream; }
        }
        /// <summary>
        /// Gets whether the client is listening for packets
        /// </summary>
        public bool Listening
        {
            get { return listening; }
        }
        /// <summary>
        /// Gets the port the RTP client is listening on
        /// </summary>
        public int Port
        {
            get { return port; }
        }

        /// <summary>
        /// RTP Client for receiving an RTP stream containing a WAVE audio stream
        /// </summary>
        /// <param name="port">The port to listen on</param>
        public RTPClient(int port)
        {
            Console.WriteLine(" [RTPClient] Loading...");

            this.port = port;

            // Initialize the audio stream that will hold the data
            audioStream = new SpeechStreamer(AUDIO_BUFFER_SIZE);

            Console.WriteLine(" Done");
        }

        /// <summary>
        /// Creates a connection to the RTP stream
        /// </summary>
        public void StartClient()
        {
            // Create new UDP client. The IP end point tells us which IP is sending the data
            client = new UdpClient(port);
            endPoint = new IPEndPoint(IPAddress.Any, port);

            listening = true;
            listenerThread = new Thread(ReceiveCallback);
            listenerThread.Start();

            Console.WriteLine(" [RTPClient] Listening for packets on port " + port + "...");
        }

        /// <summary>
        /// Tells the UDP client to stop listening for packets.
        /// </summary>
        public void StopClient()
        {
            // Set the boolean to false to stop the asynchronous packet receiving
            listening = false;
            Console.WriteLine(" [RTPClient] Stopped listening on port " + port);
        }

        /// <summary>
        /// Handles the receiving of UDP packets from the RTP stream
        /// </summary>
        /// <param name="ar">Contains packet data</param>
        private void ReceiveCallback()
        {
            // Begin looking for the next packet
            while (listening)
            {
                // Receive packet
                byte[] packet = client.Receive(ref endPoint);

                // Decode the header of the packet
                int version = GetRTPHeaderValue(packet, 0, 1);
                int padding = GetRTPHeaderValue(packet, 2, 2);
                int extension = GetRTPHeaderValue(packet, 3, 3);
                int csrcCount = GetRTPHeaderValue(packet, 4, 7);
                int marker = GetRTPHeaderValue(packet, 8, 8);
                int payloadType = GetRTPHeaderValue(packet, 9, 15);
                int sequenceNum = GetRTPHeaderValue(packet, 16, 31);
                int timestamp = GetRTPHeaderValue(packet, 32, 63);
                int ssrcId = GetRTPHeaderValue(packet, 64, 95);

                if (writeHeaderToConsole)
                {
                    Console.WriteLine("{0} {1} {2} {3} {4} {5} {6} {7} {8}",
                        version,
                        padding,
                        extension,
                        csrcCount,
                        marker,
                        payloadType,
                        sequenceNum,
                        timestamp,
                        ssrcId);
                }

                // Write the packet to the audio stream
                audioStream.Write(packet, 12, packet.Length - 12);
            }
        }

        /// <summary>
        /// Grabs a value from the RTP header in Big-Endian format
        /// </summary>
        /// <param name="packet">The RTP packet</param>
        /// <param name="startBit">Start bit of the data value</param>
        /// <param name="endBit">End bit of the data value</param>
        /// <returns>The value</returns>
        private int GetRTPHeaderValue(byte[] packet, int startBit, int endBit)
        {
            int result = 0;

            // Number of bits in value
            int length = endBit - startBit + 1;

            // Values in RTP header are big endian, so need to do these conversions
            for (int i = startBit; i <= endBit; i++)
            {
                int byteIndex = i / 8;
                int bitShift = 7 - (i % 8);
                result += ((packet[byteIndex] >> bitShift) & 1) * (int)Math.Pow(2, length - i + startBit - 1);
            }
            return result;
        }
    }
Run Code Online (Sandbox Code Playgroud)