FFmpeg Opus 断断续续的声音更新说明

eas*_*ezy 5 c++ ffmpeg resampling opus

我正在使用 FFmpeg 并尝试使用内置的 FFmpeg“opus”编解码器将原始 PCM 声音编码和解码为 Opus。我的输入样本是 AV_SAMPLE_FMT_S16 格式的原始 PCM 8000 Hz 16 位单声道。由于 Opus 只需要采样格式 AV_SAMPLE_FMT_FLTP 和采样率 48000 Hz，所以我在编码之前重新采样我的样本。

我有两个ResamplerAudio类的实例，它们执行重采样音频样本的工作，并且有一个成员SwrContext，我使用第一个实例ResamplerAudio在编码之前重采样原始 PCM 输入音频，第二个实例用于重采样解码音频以获得它的格式和采样率与输入原始音频的源值相同。

ResamplerAudio 类有一个函数来初始化它的 SwrContext 成员，如下所示：

void ResamplerAudio::init(AVCodecContext *codecContext, int inSampleRate, int outSampleRate, AVSampleFormat inSampleFmt, AVSampleFormat outSampleFmt)
{
    swrContext = swr_alloc();
    if (!swrContext)
    {
        LOGE(TAG, "[init] Couldn't allocate swr context");
        return;
    }

    av_opt_set_int(swrContext, "in_channel_layout", (int64_t) codecContext->channel_layout, 0);
    av_opt_set_int(swrContext, "out_channel_layout", (int64_t) codecContext->channel_layout,  0);

    av_opt_set_int(swrContext, "in_channel_count", codecContext->channels, 0);
    av_opt_set_int(swrContext, "out_channel_count", codecContext->channels, 0);

    av_opt_set_int(swrContext, "in_sample_rate", inSampleRate, 0);
    av_opt_set_int(swrContext, "out_sample_rate", outSampleRate, 0);

    av_opt_set_sample_fmt(swrContext, "in_sample_fmt", inSampleFmt, 0);
    av_opt_set_sample_fmt(swrContext, "out_sample_fmt", outSampleFmt,  0);

    int ret = swr_init(swrContext);
    if (ret < 0)
    {
        LOGE(TAG, "[init] swr_init error: %s", av_err2str(ret));
        return;
    }

    LOGD(TAG, "[init] success codecContext->channel_layout: %d; inSampleRate: %d; outSampleRate: %d; inSampleFmt: %d; outSampleFmt: %d", (int) codecContext->channel_layout, inSampleRate, outSampleRate, inSampleFmt, outSampleFmt);
}

Run Code Online (Sandbox Code Playgroud)

我使用以下ResamplerAudio::init参数为第一个实例调用函数ResamplerAudio（这个实例在编码之前重新采样原始 PCM 输入音频，我调用它resamplerEncoder）：

resamplerEncoder->init(contextEncoder, 8000, 48000, AV_SAMPLE_FMT_S16, AV_SAMPLE_FMT_FLTP);

Run Code Online (Sandbox Code Playgroud)

第二个实例ResamplerAudio（这个实例在从 Opus 解码音频后重新采样，我称之为resamplerDecoder）我使用以下参数初始化：

resamplerDecoder->init(contextDecoder, 48000, 8000, AV_SAMPLE_FMT_FLTP, AV_SAMPLE_FMT_S16);

Run Code Online (Sandbox Code Playgroud)

ResamplerAudio重采样的功能如下所示：

std::vector<uint8_t> ResamplerAudio::convert(uint8_t **inData, int inSamplesCount, int outChannels, int outFormat)
{
    std::vector<uint8_t> result;
    uint8_t *dstData = NULL;
    const int dstNbSamples = swr_get_out_samples(swrContext, inSamplesCount);
    av_samples_alloc(&dstData, NULL, outChannels, dstNbSamples, AVSampleFormat(outFormat), 1);
    int resampledSize = swr_convert(swrContext, &dstData, dstNbSamples, (const uint8_t **)inData, inSamplesCount);
    int dstBufSize = av_samples_get_buffer_size(NULL, outChannels, resampledSize, AVSampleFormat(outFormat), 1);

    if (dstBufSize <= 0) return result;

    std::copy(&dstData[0], &dstData[dstBufSize], std::back_inserter(result));

    return result;
}

Run Code Online (Sandbox Code Playgroud)

我ResamplerAudio::convert在使用以下参数编码之前调用函数：

// data - an array of raw pcm audio
// dataLength - the length of data array
// getSamplesCount() - function that calculates samples count
// frameEncode - AVFrame that using for encode audio
std::vector<uint8_t> resampledData = resamplerEncoder->convert(&data, getSamplesCount(dataLength, frameEncode->channels, AV_SAMPLE_FMT_S16), frameEncode->channels, frameEncode->format);

Run Code Online (Sandbox Code Playgroud)

getSamplesCount() 函数看起来像这样：

getSamplesCount(int bytesCount, int channels, AVSampleFormat format)
{
    return bytesCount / av_get_bytes_per_sample(format) / channels;
}

Run Code Online (Sandbox Code Playgroud)

之后，我frameEncode用重新采样的样本填充我的：

memcpy(&frame->data[0][0], &resampledData[0], sizeof(uint8_t) * resampledDataLength);

Run Code Online (Sandbox Code Playgroud)

并传递frameEncode给这样的编码encodeFrame(resampledDataLength)：

void encodeFrame(int dataLength)
{
    /* send the frame for encoding */
    int ret = avcodec_send_frame(contextEncoder, frameEncode);
    if (ret < 0)
    {
        LOGE(TAG, "[encodeFrame] avcodec_send_frame error: %s", av_err2str(ret));
        return;
    }

    /* read all the available output packets (in general there may be any number of them */
    while (ret >= 0)
    {
        ret = avcodec_receive_packet(contextEncoder, packetEncode);
        if (ret < 0 && ret != AVERROR(EAGAIN)) LOGE(TAG, "[encodeFrame] error in avcodec_receive_packet: %s", av_err2str(ret));
        if (ret < 0) break;

        // encodedData - std::vector<uint8_t> that stores encoded data
        std::copy(&packetEncode->data[0], &packetEncode->data[dataLength], std::back_inserter(encodedData));
        av_packet_unref(packetEncode);
    }
}

Run Code Online (Sandbox Code Playgroud)

然后我解码我的编码样本并重新采样以源样本格式和采样率取回它们，因此我使用以下参数调用ResamplerAudio::convert函数resamplerDecoder：

// frameDecode - AVFrame that holds decoded audio
std::vector<uint8_t> resampledData = resamplerDecoder->convert(frameDecode->data, frameDecode->nb_samples, frameDecode->channels, AV_SAMPLE_FMT_S16);

Run Code Online (Sandbox Code Playgroud)

结果声音断断续续，我还注意到解码后的数组大小大于原始 pcm 音频的源数组大小。

请任何想法我做错了什么？

UPD 18.05.2020

我测试了我的重采样逻辑，我在没有任何编码和解码例程的情况下对原始 pcm 声音进行了重采样。首先，我尝试将输入声音的采样率从 8000 Hz 转换为 48000 Hz，而不是从上面的步骤中重新采样并将其采样率从 48000 Hz 转换为 8000 Hz，结果声音完美而干净，我也做了同样的事情步骤，但我不是将采样率而是将采样格式从 AV_SAMPLE_FMT_S16 转换为 AV_SAMPLE_FMT_FLTP，反之亦然，结果声音再次完美而干净，当我同时覆盖采样率和采样格式时，我也得到了相同的结果。所以我假设声音失真和断断续续的问题出在我的编码或解码程序中，我认为最有可能在解码程序中，因为解码后我总是尽管输入声音的大小有多大，但还是获得了 960 nb_samples 的 AVFrame。

我的解码程序如下所示：

std::vector<uint8_t> decode(uint8_t *data, unsigned int dataLength)
{
    decodedData.clear();

    int dataSize = dataLength;

    while (dataSize > 0)
    {
        if (!frameDecode)
        {
            frameDecode = av_frame_alloc();
            if (!frameDecode)
            {
                LOGE(TAG, "[decode] Couldn't allocate the frame");
                return EMPTY_DATA;
            }
        }

        ret = av_parser_parse2(parser, contextDecoder, &packetDecode->data, &packetDecode->size, &data[0], dataSize, AV_NOPTS_VALUE, AV_NOPTS_VALUE, 0);
        if (ret < 0) {
            LOGE(TAG, "[decode] av_parser_parse2 error: %s", av_err2str(ret));
            return EMPTY_DATA;
        }

        data += ret;
        dataSize -= ret;

        doDecode();
    }
    return decodedData;
}

void doDecode()
{
    if (packetDecode->size) {
        /* send the packet with the compressed data to the decoder */
        int ret = avcodec_send_packet(contextDecoder, packetDecode);
        if (ret < 0) LOGE(TAG, "[decode] avcodec_send_packet error: %s", av_err2str(ret));

        /* read all the output frames (in general there may be any number of them */
        while (ret >= 0)
        {
            ret = avcodec_receive_frame(contextDecoder, frameDecode);
            if (ret < 0 && ret != AVERROR(EAGAIN) && ret != AVERROR_EOF) LOGE(TAG, "[decode] avcodec_receive_frame error: %s", av_err2str(ret));
            if (ret < 0) break;

            std::vector<uint8_t> resampledData = resamplerDecoder->convert(frameDecode->data, frameDecode->nb_samples, frameDecode->channels, AV_SAMPLE_FMT_S16);
            if (!resampledData.size()) continue;
            std::copy(&resampledData.data()[0], &resampledData.data()[resampledData.size()], std::back_inserter(decodedData));
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

UPD 30.05.2020

我决定拒绝在我的项目中使用 FFmpeg，而是使用libopus 1.3.1，所以我在它周围做了一个包装器，它工作正常。

归档时间：	5 年，8 月前
查看次数：	504 次
最近记录：	5 年，8 月前