如何为 Keras/tf.Keras 构建自定义数据生成器，其中 X 图像被增强，相应的 Y 标签也是图像

Question

如何为 Keras/tf.Keras 构建自定义数据生成器，其中 X 图像被增强，相应的 Y 标签也是图像

Des*_*wal 6 python deep-learning keras tensorflow tensorflow2.0

我正在使用 UNet 进行图像二值化，并且有一个包含 150 张图像及其二值化版本的数据集。我的想法是随机增强图像，使它们看起来不同，所以我制作了一个函数，将 4-5 种类型的噪声、偏度、剪切等插入到图像中。我本可以轻松使用

ImageDataGenerator(preprocess_function=my_aug_function)增强图像，但问题是我的y 目标也是图像。另外，我可以使用类似的东西：

train_dataset = (
    train_dataset.map(
        encode_single_sample, num_parallel_calls=tf.data.experimental.AUTOTUNE
    )
    .batch(batch_size)
    .prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
)

Run Code Online (Sandbox Code Playgroud)

但它有两个问题：

对于较大的数据集，它会耗尽内存，因为数据需要已经在内存中
这是我需要在旅途中增强图像的关键部分，使其看起来像我拥有一个巨大的数据集。

另一种解决方案是将增强图像保存到目录中并将其大小设置为 30-40K，然后加载它们。这样做是愚蠢的。

现在的想法是我可以用作Sequence父类，但是如何使用相应的 Y 二值化图像继续动态增强和生成新图像？

我有一个想法，如下面的代码。有人可以帮助我增强和生成 y 图像吗？我的X_DIR, Y_DIR二值化和原始图像名称相同，但存储在不同的目录中。

class DataGenerator(tensorflow.keras.utils.Sequence):
    def __init__(self, files_path, labels_path, batch_size=32, shuffle=True, random_state=42):
        'Initialization'
        self.files = files_path
        self.labels = labels_path
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.random_state = random_state
        self.on_epoch_end()


    def on_epoch_end(self):
        'Updates indexes after each epoch'
        # Shuffle the data here


    def __len__(self):
        return int(np.floor(len(self.files) / self.batch_size))

    def __getitem__(self, index):
        # What do I do here? 


    def __data_generation(self, files):
        # I think this is responsible for Augmentation but no idea how should I implement it and how does it works.

Run Code Online (Sandbox Code Playgroud)

Answer 1

Dee*_*Raj 5

自定义图像数据生成器

将目录数据加载到 CustomDataGenerator 的数据框中

def data_to_df(data_dir, subset=None, validation_split=None):
    df = pd.DataFrame()
    filenames = []
    labels = []
    
    for dataset in os.listdir(data_dir):
        img_list = os.listdir(os.path.join(data_dir, dataset))
        label = name_to_idx[dataset]
        
        for image in img_list:
            filenames.append(os.path.join(data_dir, dataset, image))
            labels.append(label)
        
    df["filenames"] = filenames
    df["labels"] = labels
    
    if subset == "train":
        split_indexes = int(len(df) * validation_split)
        train_df = df[split_indexes:]
        val_df = df[:split_indexes]
        return train_df, val_df
    
    return df

train_df, val_df = data_to_df(train_dir, subset="train", validation_split=0.2)

Run Code Online (Sandbox Code Playgroud)

自定义数据生成器


import tensorflow as tf
from PIL import Image
import numpy as np

class CustomDataGenerator(tf.keras.utils.Sequence):

    ''' Custom DataGenerator to load img 
    
    Arguments:
        data_frame = pandas data frame in filenames and labels format
        batch_size = divide data in batches
        shuffle = shuffle data before loading
        img_shape = image shape in (h, w, d) format
        augmentation = data augmentation to make model rebust to overfitting
    
    Output:
        Img: numpy array of image
        label : output label for image
    '''
    
    def __init__(self, data_frame, batch_size=10, img_shape=None, augmentation=True, num_classes=None):
        self.data_frame = data_frame
        self.train_len = len(data_frame)
        self.batch_size = batch_size
        self.img_shape = img_shape
        self.num_classes = num_classes
        print(f"Found {self.data_frame.shape[0]} images belonging to {self.num_classes} classes")

    def __len__(self):
        ''' return total number of batches '''
        self.data_frame = shuffle(self.data_frame)
        return math.ceil(self.train_len/self.batch_size)

    def on_epoch_end(self):
        ''' shuffle data after every epoch '''
        # fix on epoch end it's not working, adding shuffle in len for alternative
        pass
    
    def __data_augmentation(self, img):
        ''' function for apply some data augmentation '''
        img = tf.keras.preprocessing.image.random_shift(img, 0.2, 0.3)
        img = tf.image.random_flip_left_right(img)
        img = tf.image.random_flip_up_down(img)
        return img
        
    def __get_image(self, file_id):
        """ open image with file_id path and apply data augmentation """
        img = np.asarray(Image.open(file_id))
        img = np.resize(img, self.img_shape)
        img = self.__data_augmentation(img)
        img = preprocess_input(img)

        return img

    def __get_label(self, label_id):
        """ uncomment the below line to convert label into categorical format """
        #label_id = tf.keras.utils.to_categorical(label_id, num_classes)
        return label_id

    def __getitem__(self, idx):
        batch_x = self.data_frame["filenames"][idx * self.batch_size:(idx + 1) * self.batch_size]
        batch_y = self.data_frame["labels"][idx * self.batch_size:(idx + 1) * self.batch_size]
        # read your data here using the batch lists, batch_x and batch_y
        x = [self.__get_image(file_id) for file_id in batch_x] 
        y = [self.__get_label(label_id) for label_id in batch_y]

        return tf.convert_to_tensor(x), tf.convert_to_tensor(y)

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，2 月前
查看次数：	6163 次
最近记录：	2 年，11 月前