如何在 TensorFlow 中使用我自己的数据将图像拆分为测试集和训练集

Question

如何在 TensorFlow 中使用我自己的数据将图像拆分为测试集和训练集

Liz*_*Liz 3 python scikit-learn train-test-split tensorflow2.0

我在这里有点困惑......我刚刚花了最后一个小时阅读有关如何在 TensorFlow 中将数据集拆分为测试/训练的内容。我正在按照本教程导入我的图像： https: //www.tensorflow.org/tutorials/load_data/images。显然，可以使用 sklearn: 分为训练/测试model_selection.train_test_split。

但我的问题是：我什么时候将数据集拆分为训练/测试。我已经用我的数据集完成了此操作（见下文），现在怎么办？我该如何分割它？我必须在加载文件之前执行此操作吗tf.data.Dataset？

# determine names of classes
CLASS_NAMES = np.array([item.name for item in data_dir.glob('*') if item.name != "LICENSE.txt"])
print(CLASS_NAMES)

# count images
image_count = len(list(data_dir.glob('*/*.png')))
print(image_count)


# load the files as a tf.data.Dataset
list_ds = tf.data.Dataset.list_files(str(cwd + '/train/' + '*/*'))

Run Code Online (Sandbox Code Playgroud)

另外，我的数据结构如下所示。没有 test 文件夹，没有 val 文件夹。我需要从该火车组中抽取 20% 进行测试。

train
 |__ class 1
 |__ class 2
 |__ class 3

Run Code Online (Sandbox Code Playgroud)

Answer 1

Vla*_*kov 5

您可以使用tf.keras.preprocessing.image.ImageDataGenerator：

image_generator = tf.keras.preprocessing.image.ImageDataGenerator(validation_split=0.2)
train_data_gen = image_generator.flow_from_directory(directory='train',
                                                     subset='training')
val_data_gen = image_generator.flow_from_directory(directory='train',
                                                   subset='validation')

Run Code Online (Sandbox Code Playgroud)

请注意，您可能需要为生成器设置其他与数据相关的参数。

更新：skip()您可以通过和获取数据集的两个切片take()：

val_data = data.take(val_data_size)
train_data = data.skip(val_data_size)

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，11 月前
查看次数：	10516 次
最近记录：	5 年，3 月前