从张量流数据集中提取数据(例如到numpy)

Ste*_*ant 0 python keras tensorflow tensorflow-datasets

我正在通过加载图像

data = keras.preprocessing.image_dataset_from_directory(
  './data', 
  labels='inferred', 
  label_mode='binary', 
  validation_split=0.2, 
  subset="training", 
  image_size=(img_height, img_width), 
  batch_size=sz_batch, 
  crop_to_aspect_ratio=True
)
Run Code Online (Sandbox Code Playgroud)

我也想在非张量流例程中使用获得的数据。因此,我想将数据提取到 numpy 数组中。我怎样才能实现这个目标?我不能使用tfds

Alo*_*her 5

我建议取消批处理您的数据集并使用tf.data.Dataset.map

import numpy as np
import tensorflow as tf

dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)
data_dir = pathlib.Path(data_dir)
batch_size = 32

train_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(180, 180),
  batch_size=batch_size,
  shuffle=False)

train_ds = train_ds.unbatch()
images = np.asarray(list(train_ds.map(lambda x, y: x)))
labels = np.asarray(list(train_ds.map(lambda x, y: y)))
Run Code Online (Sandbox Code Playgroud)

或者按照评论中的建议,您也可以尝试仅使用批次并随后将它们连接起来:

images = np.concatenate(list(train_ds.map(lambda x, y: x)))
labels = np.concatenate(list(train_ds.map(lambda x, y: y)))
Run Code Online (Sandbox Code Playgroud)

或者设置shuffle=True并使用tf.TensorArray

images = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True)
labels = tf.TensorArray(dtype=tf.int32, size=0, dynamic_size=True)

for x, y in train_ds.unbatch():
  images = images.write(images.size(), x)
  labels = labels.write(labels.size(), y)

images = tf.stack(images.stack(), axis=0)
labels = tf.stack(labels.stack(), axis=0)
Run Code Online (Sandbox Code Playgroud)