FFMpeg库:如何在音频文件中精确搜索

Question

FFMpeg库:如何在音频文件中精确搜索

在我的Android应用程序中使用FFMpeg库,我尝试了解如何在一个非常精确的位置搜索音频文件.

例如,我想将文件中的当前位置设置为帧#1234567(在44100 Hz编码的文件中),相当于27994.717毫秒的搜索.

为此,我尝试了以下内容:

// this:
av_seek_frame(formatContext, -1, 27994717, 0);

// or this:
av_seek_frame(formatContext, -1, 27994717, AVSEEK_FLAG_ANY);

// or even this:
avformat_seek_file(formatContext, -1, 27994617, 27994717, 27994817, 0);

Run Code Online (Sandbox Code Playgroud)

使用微秒的位置给我带来了迄今为止最好的结果.

但由于某种原因,定位并不完全准确:当我从音频文件中提取样本时,它并不完全在预期位置开始.有一点点延迟大约30-40毫秒(即使我寻找位置0,令人惊讶......).

我是否以正确的方式使用该功能,甚至是正确的功能？

编辑

以下是我如何获得这个职位:

AVPacket packet;
AVStream *stream = NULL;
AVFormatContext *formatContext = NULL;
AVCodec *dec = NULL;

// initialization:
avformat_open_input(&formatContext, filename, NULL, NULL);
avformat_find_stream_info(formatContext, NULL);
int audio_stream_index = av_find_best_stream(formatContext, AVMEDIA_TYPE_AUDIO, -1, -1, &dec, 0);
stream = formatContext->streams[audio_stream_index];

...

// later, when I extract samples, here is how I get my position, in microseconds:
av_read_frame(formatContext, &packet);
long position = (long) (1000000 * (packet.pts * ((float) stream->time_base.num / stream->time_base.den)));

Run Code Online (Sandbox Code Playgroud)

感谢那段代码,我可以获得当前帧开头的位置(帧=样本块,大小取决于音频格式 - 对于mp3为1152个样本,对于ogg为128到1152,...)

问题是:我得到的值position不准确:它实际上是晚了30毫秒左右.例如,当它表示1000000时,实际位置约为1030000 ......

我做错了什么？这是FFMpeg中的错误吗？

谢谢你的帮助.

Answer 1

bar*_*que 5

晚了，但希望对某人有帮助。这个想法是在查找时保存时间戳，然后将AVPacket->pts与该值进行比较（您可以使用AVStream->dts来做到这一点，但在我的实验中它没有给出好的结果）。如果pts仍然低于我们的目标时间戳，则使用AVPacket->side_data的AV_PKT_DATA_SKIP_SAMPLES功能跳过帧。

求方法代码：

void audio_decoder::seek(float seconds) { auto stream = m_format_ctx->streams[m_packet->stream_index]; // convert seconds provided by the user to a timestamp in a correct base, // then save it for later. m_target_ts = av_rescale_q(seconds * AV_TIME_BASE, AV_TIME_BASE_Q, stream->time_base); avcodec_flush_buffers(m_codec_ctx.get()); // Here we seek within given stream index and the correct timestamp // for that stream. Using AVSEEK_FLAG_BACKWARD to make sure we're // always *before* requested timestamp. if(int err = av_seek_frame(m_format_ctx.get(), m_packet->stream_index, m_target_ts, AVSEEK_FLAG_BACKWARD)) { error("audio_decoder: Error while seeking ({})", av_err_str(err)); } }
Run Code Online (Sandbox Code Playgroud)
以及解码方法的代码：

void audio_decoder::decode() { <...> while(is_decoding) { // Read data as usual. av_read_frame(m_format_ctx.get(), m_packet.get()); // Here is the juicy part. We were seeking, but the seek // wasn't precise enough so we need to drop some frames. if(m_packet->pts > 0 && m_target_ts > 0 && m_packet->pts < m_target_ts) { auto stream = m_format_ctx->streams[m_packet->stream_index]; // Conversion from delta timestamp to frames. auto time_delta = static_cast<float>(m_target_ts - m_packet->pts) / stream->time_base.den; int64_t skip_frames = time_delta * m_codec_ctx->time_base.den / m_codec_ctx->time_base.num; // Next step: we need to provide side data to our packet, // and it will tell the codec to drop frames. uint8_t *data = av_packet_get_side_data(m_packet.get(), AV_PKT_DATA_SKIP_SAMPLES, nullptr); if(!data) { data = av_packet_new_side_data(m_packet.get(), AV_PKT_DATA_SKIP_SAMPLES, 10); } // Define parameters of side data. You can check them here: // https://ffmpeg.org/doxygen/trunk/group__lavc__packet.html#ga9a80bfcacc586b483a973272800edb97 *reinterpret_cast<uint32_t*>(data) = skip_frames; data[8] = 0; } // Send packet as usual. avcodec_send_packet(m_codec_ctx.get(), m_packet.get()); // Proceed to the receiving frames as usual, nothing to change there. } <...> }
Run Code Online (Sandbox Code Playgroud)
如果在没有上下文的情况下不清楚，您可以在我的项目audio_decoder.cpp中检查相同的代码。

Answer 2

sza*_*ary 3

这取决于编解码器。例如aac的分辨率为每帧1024个样本，无论采样率是多少，它也有可能被丢弃的启动样本。MP3 每帧有 576 或 1152 个样本，具体取决于层。

如果您需要完美，请使用未压缩的格式，例如 wav 或 riff。

有一个方法，需要添加第二步。ffmpeg解码到大概位置后，需要扔掉不需要的样本来实现子帧查找。 (4认同)

归档时间：	7 年，2 月前
查看次数：	708 次
最近记录：	7 年，2 月前