填充随机缓冲区(这可能需要一段时间)

moh*_*dkh 6 python bigdata deep-learning keras tensorflow

我有一个数据集,其中包含视频帧,部分为 1000 个真实视频和 1000 个深度假视频。每个视频在预处理阶段转换为其他世界中的 300 帧后,我有一个数据集,其中包含 300000 张带有 Real(0) 标签的图像和 300000 张带有 Fake(1) 标签的图像。我想用这些数据训练 MesoNet。我使用 costum DataGenerator 类来处理比率为 0.8,0.1,0.1 的训练、验证、测试数据,但是当我运行该项目时显示以下消息:

Filling up shuffle buffer (this may take a while):
Run Code Online (Sandbox Code Playgroud)

我可以做什么来解决这个问题?

您可以在下面看到 DataGenerator 类。

class DataGenerator(keras.utils.Sequence):
'Generates data for Keras'
def __init__(self, df, labels, batch_size =32, img_size = (224,224),
             n_classes = 2, shuffle=True):
    'Initialization'
    self.batch_size = batch_size
    self.labels = labels
    self.df = df
    self.img_size = img_size
    self.n_classes = n_classes
    self.shuffle = shuffle
    self.batch_labels = []
    self.batch_names = []
    self.on_epoch_end()

def __len__(self):
    'Denotes the number of batches per epoch'
    return int(np.floor(len(self.df) / self.batch_size))

def __getitem__(self, index):
    
    batch_index = self.indexes[index * self.batch_size : (index + 1) * self.batch_size]
    frame_paths = self.df.iloc[batch_index]["framePath"].values
    frame_label = self.df.iloc[batch_index]["label"].values

    imgs = [cv2.imread(frame) for frame in frame_paths]
    imgs = [cv2.cvtColor(img, cv2.COLOR_BGR2RGB) for img in imgs]
    imgs = [
             cv2.resize(img, self.img_size) for img in imgs if img.shape != self.img_size
             ]
    batch_imgs = np.asarray(imgs)
    labels = list(map(int, frame_label))
    y = np.array(labels)
    self.batch_labels.extend(labels)
    self.batch_names.extend([str(frame).split("\\")[-1] for frame in frame_paths])

    return (
        batch_imgs,y  
    )

def on_epoch_end(self):
    'Updates indexes after each epoch'
    self.indexes = np.arange(len(self.df))
    if self.shuffle == True:
        np.random.shuffle(self.indexes)
Run Code Online (Sandbox Code Playgroud)