在Python中将实时MP3音频流转换为8000/mulaw

Question

在Python中将实时MP3音频流转换为8000/mulaw

use*_*108 5 python audio mp3 twilio mu-law

我正在使用一个以 MP3 格式（44.1kHz/16 位）传输实时音频的 API，我需要将此流转换为 8000/mulaw。我尝试了多种解决方案，但由于 MP3 数据的结构，所有解决方案都遇到了问题。

\n

我当前的方法是使用 PyDub 和 Python 的 audioop 模块在每个音频块到达时对其进行解码和处理。然而，我经常遇到一些错误，这些错误似乎是由于尝试解码不包含完整 MP3 帧的数据块而引起的。

\n

这是我当前代码的简化版本：

\n

from pydub import AudioSegment\nimport audioop\nimport io\n\nclass StreamConverter:\n    def __init__(self):\n        self.state = None  \n        self.buffer = b''  \n\n    def convert_chunk(self, chunk):\n        # Add the chunk to the buffer\n        self.buffer += chunk\n\n        # Try to decode the buffer\n        try:\n            audio = AudioSegment.from_mp3(io.BytesIO(self.buffer))\n        except CouldntDecodeError:\n            return None\n\n        # If decoding was successful, empty the buffer\n        self.buffer = b''\n\n        # Ensure audio is mono\n        if audio.channels != 1:\n            audio = audio.set_channels(1)\n\n        # Get audio data as bytes\n        raw_audio = audio.raw_data\n\n        # Sample rate conversion\n        chunk_8khz, self.state = audioop.ratecv(raw_audio, audio.sample_width, audio.channels, audio.frame_rate, 8000, self.state)\n\n        # \xce\xbc-law conversion\n        chunk_ulaw = audioop.lin2ulaw(chunk_8khz, audio.sample_width)\n\n        return chunk_ulaw\n\n# This is then used as follows:\nfor chunk in audio_stream:\n    if chunk is not None:\n        ulaw_chunk = converter.convert_chunk(chunk)\n        # send ulaw_chunk to twilio api\n

Run Code Online (Sandbox Code Playgroud)\n

我相信我的问题源于这样一个事实：MP3 数据是按帧构建的，如果块不包含完整的帧，我将无法可靠地解码音频。另外，一个帧可能会被分成两个块，所以我无法独立解码它们。

\n

有人对我如何处理这个问题有任何想法吗？有没有办法在转换为 8000/mulaw 的同时实时处理 MP3 流，可能使用不同的库或方法？

\n

Answer 1

小智 0

策略一：

\n

您可以使用librosa： https: //librosa.org/实时解码 MP3 流。Librosa有一个名为的函数load()，可以将 MP3 流解码为 numpy 数组。然后，您可以使用这个 numpy 数组来执行采样率转换和 mulaw 转换。这是示例代码：

\n

import librosa\nimport numpy as np\n\ndef convert_chunk(chunk):\n    audio = librosa.load(io.BytesIO(chunk), sr=44100, mono=True)\n    chunk_8khz = librosa.resample(audio, 8000)\n    chunk_ulaw = audioop.lin2ulaw(chunk_8khz, audio.sample_width)\n    return chunk_ulaw\n

Run Code Online (Sandbox Code Playgroud)\n

这将实时解码 MP3 流并将其转换为 8000/mulaw。代码的输出是一个字节数组，可以发送到 Twilio API。

\n

策略2：

\n

首先将 MP3 流转换为 WAV 流，然后执行必要的转换。像这样-

\n

    def convert_chunk(self, chunk):\n        # Add the chunk to the buffer\n        self.buffer += chunk\n\n        # Try to decode the buffer as WAV\n        try:\n            audio = AudioSegment.from_mp3(io.BytesIO(self.buffer))\n            wav_data = audio.export(format=\'wav\').read() # Convert to WAV\n        except Exception:\n            return None\n\n        # If decoding was successful, empty the buffer\n        self.buffer = b\'\'\n\n        # Ensure audio is mono and 16-bit\n        if audio.channels != 1 or audio.sample_width != 2:\n            audio = audio.set_channels(1).set_sample_width(2)\n\n        # Sample rate conversion\n        chunk_8khz, self.state = audioop.ratecv(wav_data, 2, 1, audio.frame_rate, 8000, self.state)\n\n        # \xce\xbc-law conversion\n        chunk_ulaw = audioop.lin2ulaw(chunk_8khz, 2)\n\n        return chunk_ulaw\n

Run Code Online (Sandbox Code Playgroud)\n

通过首先将 MP3 流转换为 WAV 格式，您可以克服 MP3 帧不完整的挑战并确保可靠的转换过程。

\n

请注意，转换期间样本宽度设置为 2（16 位）。如果您的 MP3 音频流具有不同的样本宽度，您可能需要进行相应调整。

\n

归档时间：	2 年，6 月前
查看次数：	1687 次
最近记录：	2 年，6 月前