FFMPEG:复用具有不同持续时间的流

Mic*_* IV 22 c++ audio video ffmpeg libavformat

我正在多路复用视频和音频流.视频流来自生成的图像数据.音频流来自aac文件.有些音频文件比我设置的总视频时间长,所以当我的时间变得大于总视频时间(我用数字编码的视频帧控制的最后一个)时,我的策略是停止音频流复用.

我不会在这里放置整个设置代码,但它类似于最新的FFMPEG repo中的muxing.c示例.唯一的区别是我使用来自文件的音频流,正如我所说的,不是来自合成生成的编码帧.我很确定问题是在muxer循环期间我的错误同步.这就是我所做的:

void AudioSetup(const char* audioInFileName)
{
    AVOutputFormat* outputF = mOutputFormatContext->oformat;
    auto audioCodecId = outputF->audio_codec;

    if (audioCodecId == AV_CODEC_ID_NONE) {
        return false;
    }

    audio_codec = avcodec_find_encoder(audioCodecId);

    avformat_open_input(&mInputAudioFormatContext,
    audioInFileName, 0, 0);
    avformat_find_stream_info(mInputAudioFormatContext, 0);

    av_dump_format(mInputAudioFormatContext, 0, audioInFileName, 0);


    for (size_t i = 0; i < mInputAudioFormatContext->nb_streams; i++) {
        if (mInputAudioFormatContext->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
            inAudioStream = mInputAudioFormatContext->streams[i];

            AVCodecParameters *in_codecpar = inAudioStream->codecpar;
            mAudioOutStream.st = avformat_new_stream(mOutputFormatContext, NULL);
            mAudioOutStream.st->id = mOutputFormatContext->nb_streams - 1;
            AVCodecContext* c = avcodec_alloc_context3(audio_codec);
            mAudioOutStream.enc = c;
            c->sample_fmt = audio_codec->sample_fmts[0];
            avcodec_parameters_to_context(c, inAudioStream->codecpar);
            //copyparams from input to autput audio stream:
            avcodec_parameters_copy(mAudioOutStream.st->codecpar, inAudioStream->codecpar);

            mAudioOutStream.st->time_base.num = 1;
            mAudioOutStream.st->time_base.den = c->sample_rate;

            c->time_base = mAudioOutStream.st->time_base;

            if (mOutputFormatContext->oformat->flags & AVFMT_GLOBALHEADER) {
                c->flags |= CODEC_FLAG_GLOBAL_HEADER;
            }
            break;
        }
    }
}

void Encode()
{
    int cc = av_compare_ts(mVideoOutStream.next_pts, mVideoOutStream.enc->time_base,
    mAudioOutStream.next_pts, mAudioOutStream.enc->time_base);

    if (mAudioOutStream.st == NULL || cc <= 0) {
        uint8_t* data = GetYUVFrame();//returns ready video YUV frame to work with
        int ret = 0;
        AVPacket pkt = { 0 };
        av_init_packet(&pkt);
        pkt.size = packet->dataSize;
        pkt.data = data;
        const int64_t duration = av_rescale_q(1, mVideoOutStream.enc->time_base, mVideoOutStream.st->time_base);

        pkt.duration = duration;
        pkt.pts = mVideoOutStream.next_pts;
        pkt.dts = mVideoOutStream.next_pts;
        mVideoOutStream.next_pts += duration;

        pkt.stream_index = mVideoOutStream.st->index;
        ret = av_interleaved_write_frame(mOutputFormatContext, &pkt);
    } else
    if(audio_time <  video_time) {
        //5 -  duration of video in seconds
        AVRational r = {  60, 1 };

        auto cmp= av_compare_ts(mAudioOutStream.next_pts, mAudioOutStream.enc->time_base, 5, r);
        if (cmp >= 0) {
            mAudioOutStream.next_pts = (int64_t)std::numeric_limits<int64_t>::max();
            return true; //don't mux audio anymore
        }

        AVPacket a_pkt = { 0 };
        av_init_packet(&a_pkt);

        int ret = 0;
        ret = av_read_frame(mInputAudioFormatContext, &a_pkt);
        //if audio file is shorter than stop muxing when at the end of the file
        if (ret == AVERROR_EOF) {
            mAudioOutStream.next_pts = (int64_t)std::numeric_limits<int64_t>::max(); 
            return true;
        }
        a_pkt.stream_index = mAudioOutStream.st->index;

        av_packet_rescale_ts(&a_pkt, inAudioStream->time_base, mAudioOutStream.st->time_base);
        mAudioOutStream.next_pts += a_pkt.pts;

        ret = av_interleaved_write_frame(mOutputFormatContext, &a_pkt);
    }
}
Run Code Online (Sandbox Code Playgroud)

现在,视频部分完美无瑕.但是如果音轨比视频持续时间长,我的总视频长度会增加大约5% - 20%,而且很明显音频正在为此做出贡献,因为视频帧完全应该在应有的位置完成.

我附带的最接近的"黑客"是这部分:

AVRational r = {  60 ,1 };
auto cmp= av_compare_ts(mAudioOutStream.next_pts, mAudioOutStream.enc->time_base, 5, r);
if (cmp >= 0) {
    mAudioOutStream.next_pts = (int64_t)std::numeric_limits<int64_t>::max();
    return true;
} 
Run Code Online (Sandbox Code Playgroud)

在这里,我试图将next_pts音频流与为视频文件设置的总时间进行比较,即5秒.通过设置r = {60,1}我将这些秒转换为音频流的time_base.至少那是我相信我在做的事情.有了这个黑客,当使用标准AAC文件时,我的偏差与正确的电影长度非常小,即采样率为44100,立体声.但是,如果我测试更多有问题的样本,例如AAC采样率16000,单声道 - 那么视频文件几乎增加了整整一秒的大小.如果有人能指出我在这里做错了什么,我将不胜感激.

重要说明:我没有为任何上下文设置持续时间.我控制多路复用会话的终止,这是基于视频帧计数.当然,音频输入流有持续时间,但它没有帮助我,因为视频持续时间是定义电影长度的.

更新:

这是第二次赏金尝试.

更新2:

实际上,{den,num}的音频时间戳是错误的,而{1,1}确实是要走的路,正如答案所解释的那样.什么阻止它工作是这一行中的一个错误(我的坏):

     mAudioOutStream.next_pts += a_pkt.pts;
Run Code Online (Sandbox Code Playgroud)

必须是:

     mAudioOutStream.next_pts = a_pkt.pts;
Run Code Online (Sandbox Code Playgroud)

该错误导致了pts的指数增量,这导致非常早到达流的末尾(就pts而言),因此导致音频流比它应该的更早地终止.

Max*_*mer 4

问题是您告诉它将给定的音频时间与5滴答声进行比较60 seconds per tick。实际上,我很惊讶它在某些情况下有效,但我想这实际上取决于time_base给定音频流的具体情况。

\n\n

假设音频的时间为time_base1/25流的时间为6秒,这超出了您的预期,因此您想av_compare_ts返回01。考虑到这些条件,您将获得以下值:

\n\n
mAudioOutStream.next_pts = 150\nmAudioOutStream.enc->time_base = 1/25\n
Run Code Online (Sandbox Code Playgroud)\n\n

因此,您可以av_compare_ts使用以下参数进行调用:

\n\n
ts_a = 150\ntb_a = 1/25\nts_b = 5\ntb_b = 60/1\n
Run Code Online (Sandbox Code Playgroud)\n\n

现在让我们看看它的实现av_compare_ts

\n\n
int av_compare_ts(int64_t ts_a, AVRational tb_a, int64_t ts_b, AVRational tb_b)\n{\n    int64_t a = tb_a.num * (int64_t)tb_b.den;\n    int64_t b = tb_b.num * (int64_t)tb_a.den;\n    if ((FFABS(ts_a)|a|FFABS(ts_b)|b) <= INT_MAX)\n        return (ts_a*a > ts_b*b) - (ts_a*a < ts_b*b);\n    if (av_rescale_rnd(ts_a, a, b, AV_ROUND_DOWN) < ts_b)\n        return -1;\n    if (av_rescale_rnd(ts_b, b, a, AV_ROUND_DOWN) < ts_a)\n        return 1;\n    return 0;\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

给定上述值,您将得到:

\n\n
a = 1 * 1 = 1\nb = 60 * 25 = 1500\n
Run Code Online (Sandbox Code Playgroud)\n\n

然后av_rescale_rnd使用这些参数进行调用:

\n\n
a = 150\nb = 1\nc = 1500\nrnd = AV_ROUND_DOWN\n
Run Code Online (Sandbox Code Playgroud)\n\n

给定我们的参数,我们实际上可以将整个函数简化av_rescale_rnd为以下行。(我不会复制整个函数体,因为它相当长,但你可以在这里av_rescale_rnd查看。)

\n\n
return (a * b) / c;\n
Run Code Online (Sandbox Code Playgroud)\n\n

这将返回(150 * 1) / 1500,即0

\n\n

因此av_rescale_rnd(ts_a, a, b, AV_ROUND_DOWN) < ts_b将解析为true,因为0小于ts_b( 5),因此av_compare_ts将返回-1,这完全不是您想要的。

\n\n

如果您将其更改r1/1它应该可以工作,因为现在您5实际上将被视为5 seconds

\n\n
ts_a = 150\ntb_a = 1/25\nts_b = 5\ntb_b = 1/1\n
Run Code Online (Sandbox Code Playgroud)\n\n

我们av_compare_ts现在得到:

\n\n
a = 1 * 1 = 1\nb = 1 * 25 = 25\n
Run Code Online (Sandbox Code Playgroud)\n\n

然后av_rescale_rnd使用这些参数进行调用:

\n\n
a = 150\nb = 1\nc = 25\nrnd = AV_ROUND_DOWN\n
Run Code Online (Sandbox Code Playgroud)\n\n

这将返回(150 * 1) / 25,即6

\n\n

6大于5,条件失败,并av_rescale_rnd再次调用,这次使用:

\n\n
a = 5\nb = 25\nc = 1\nrnd = AV_ROUND_DOWN\n
Run Code Online (Sandbox Code Playgroud)\n\n

哪个会返回(5 * 25) / 1,哪个是125。它小于150,因此1被返回,并且 voil\xc3\xa1 您的问题已解决。

\n\n

如果step_size大于1

\n\n

如果step_size您的音频流不是1,您需要修改您的r以解决这一问题,例如step_size = 1024

\n\n
r = { 1, 1024 };\n
Run Code Online (Sandbox Code Playgroud)\n\n

让我们快速回顾一下现在发生的事情:

\n\n

约 6 秒时:

\n\n
mAudioOutStream.next_pts = 282\nmAudioOutStream.enc->time_base = 1/48000\n
Run Code Online (Sandbox Code Playgroud)\n\n

av_compare_ts获取以下参数:

\n\n
ts_a = 282\ntb_a = 1/48000\nts_b = 5\ntb_b = 1/1024\n
Run Code Online (Sandbox Code Playgroud)\n\n

因此:

\n\n
a = 1 * 1024 = 1024\nb = 1 * 48000 = 48000\n
Run Code Online (Sandbox Code Playgroud)\n\n

并在av_rescale_rnd

\n\n
a = 282\nb = 1024\nc = 48000\nrnd = AV_ROUND_DOWN\n
Run Code Online (Sandbox Code Playgroud)\n\n

(a * b) / c将给出(282 * 1024) / 48000= ,288768 / 480006.

\n\n

有了r={1,1}你就会再次得到0,因为它会计算(281 * 1) / 48000

\n