多进程视频处理

Question

多进程视频处理

hen*_*nry 2 python opencv multiprocessing imutils

我想对相邻帧进行视频处理。更具体地说，我想计算相邻帧之间的均方误差：

mean_squared_error(prev_frame,frame)

Run Code Online (Sandbox Code Playgroud)

我知道如何以线性直接的方式计算它：我使用 imutils包利用队列来解耦加载帧和处理它们。通过将它们存储在队列中，我不需要在处理它们之前等待它们。......但我想更快......

# import the necessary packages to read the video
import imutils
from imutils.video import FileVideoStream
# package to compute mean squared errror
from skimage.metrics import mean_squared_error

if __name__ == '__main__':

    # SPECIFY PATH TO VIDEO FILE
    file = "VIDEO_PATH.mp4" 

    # START IMUTILS VIDEO STREAM
    print("[INFO] starting video file thread...")
    fvs = FileVideoStream(path_video, transform=transform_image).start()

    # INITALIZE LIST to store the results
    mean_square_error_list = []

    # READ PREVIOUS FRAME
    prev_frame = fvs.read()

    # LOOP over frames from the video file stream
    while fvs.more():

        # GRAP THE NEXT FRAME from the threaded video file stream
        frame = fvs.read()

        # COMPUTE the metric
        metric_val = mean_squared_error(prev_frame,frame)
        mean_square_error_list.append(1-metric_val) # Append to list

        # UPDATE previous frame variable 
        prev_frame = frame

Run Code Online (Sandbox Code Playgroud)

现在我的问题是：如何多处理度量的计算以提高速度并节省时间？

我的操作系统是 Windows 10，我使用的是 python 3.8.0

Answer 1

Zab*_*azi 6

让事情变得更快的方面太多了，我只会关注多处理部分。

由于您不想一次阅读整个视频，因此我们必须逐帧阅读视频。

我将使用opencv (cv2)、numpy来读取帧、计算mse并将 mse 保存到磁盘。

首先，我们可以在没有任何多处理的情况下开始，这样我们就可以对结果进行基准测试。我正在使用1920 x 1080尺寸、60 FPS、持续时间：1: 29、大小：100 MB 的视频。

import cv2
import sys
import time

import numpy as np
import subprocess as sp
import multiprocessing as mp

filename = '2.mp4'

def process_video():    
    cap = cv2.VideoCapture(filename)

    proc_frames = 0

    mse = []
    prev_frame = None
    ret = True
    while ret:
        ret, frame = cap.read() # reading frames sequentially
        if ret == False:
            break

        if not (prev_frame is None):
            c_mse = np.mean(np.square(prev_frame-frame))
            mse.append(c_mse)

        prev_frame = frame

        proc_frames += 1

    np.save('data/' + 'sp' + '.npy', np.array(mse))

    cap.release()
    return


if __name__ == "__main__":

    t1 = time.time()

    process_video()

    t2 = time.time()

    print(t2-t1)

Run Code Online (Sandbox Code Playgroud)

在我的系统中，它运行了142 秒。

现在，我们可以采用多处理方法。这个想法可以总结为下图。

GIF 信用：谷歌

我们制作了一些分段（基于我们拥有多少 CPU 内核）并并行处理这些分段的帧。

import cv2
import sys
import time

import numpy as np
import subprocess as sp
import multiprocessing as mp

filename = '2.mp4'

def process_video(group_number):    
    cap = cv2.VideoCapture(filename)
    num_processes = mp.cpu_count()
    frame_jump_unit = cap.get(cv2.CAP_PROP_FRAME_COUNT) // num_processes
    cap.set(cv2.CAP_PROP_POS_FRAMES, frame_jump_unit * group_number)
    proc_frames = 0

    mse = []
    prev_frame = None
    while proc_frames < frame_jump_unit:
        ret, frame = cap.read()
        if ret == False:
            break

        if not (prev_frame is None):
            c_mse = np.mean(np.square(prev_frame-frame))
            mse.append(c_mse)

        prev_frame = frame

        proc_frames += 1

    np.save('data/' + str(group_number) + '.npy', np.array(mse))

    cap.release()
    return


if __name__ == "__main__":

    t1 = time.time()

    num_processes =  mp.cpu_count()
    print(f'CPU: {num_processes}')

    # only meta-data
    cap = cv2.VideoCapture(filename)

    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = cap.get(cv2.CAP_PROP_FPS)
    frame_jump_unit = cap.get(cv2.CAP_PROP_FRAME_COUNT) // num_processes
    cap.release()

    p = mp.Pool(num_processes)
    p.map(process_video, range(num_processes))

    # merging



    # the missing mse will be 

    final_mse = []
    for i in range(num_processes):
        na = np.load(f'data/{i}.npy')
        final_mse.extend(na)


        try:
            cap = cv2.VideoCapture(filename) # you could also take it outside the loop to reduce some overhead
            frame_no = (frame_jump_unit) * (i+1) - 1
            print(frame_no)
            cap.set(1, frame_no)
            _, frame1 = cap.read()
            #cap.set(1, ((frame_jump_unit) * (i+1)))
            _, frame2 = cap.read()
            c_mse = np.mean(np.square(frame1-frame2))
            final_mse.append(c_mse)
            cap.release()
        except:
            print('failed in 1 case')
            # in the last few frames, nothing left
            pass




    t2 = time.time()

    print(t2-t1)

    np.save(f'data/final_mse.npy', np.array(final_mse))

Run Code Online (Sandbox Code Playgroud)

我只是numpy save为了保存部分结果，你可以尝试更好的东西。

这个运行了49.56 秒，my cpu_count= 12。肯定有一些瓶颈可以避免，以使其运行得更快。

我的实现的唯一问题是，它缺少mse视频分割的区域，添加起来非常容易。因为我们可以在 O(1) 中使用 OpenCV 在任何位置索引单个帧，所以我们可以去这些位置mse单独计算并合并到最终解决方案。[检查更新的代码，它修复了合并部分]

您可以编写一个简单的健全性检查以确保两者都提供相同的结果。

import numpy as np

a = np.load('data/sp.npy')

b = np.load('data/final_mse.npy')

print(a.shape)

print(b.shape)

print(a[:10])

print(b[:10])

for i in range(len(a)):
    if a[i] != b[i]:
        print(i)

Run Code Online (Sandbox Code Playgroud)

现在，一些额外的加速可以来自使用 CUDA 编译的 opencv、ffmpeg、添加排队机制和多处理等。

归档时间：	5 年，8 月前
查看次数：	794 次
最近记录：	5 年，8 月前