有没有办法在 Tensorflow 中的另一个数据集中使用 tf.data.Dataset ？

Question

有没有办法在 Tensorflow 中的另一个数据集中使用 tf.data.Dataset ？

Pio*_*pla 4 python tensorflow tensorflow-datasets

我正在做分段。每个训练样本都有多个带有分割掩模的图像。我正在尝试编写input_fn将每个训练样本的所有掩模图像合并为一个的图像。我计划使用两个Datasets，一个迭代样本文件夹，另一个将所有掩模作为一个大批次读取，然后将它们合并为一个张量。

调用嵌套时出现错误make_one_shot_iterator。我知道这种方法有点牵强，而且很可能数据集不是为这种用途而设计的。但是我应该如何解决这个问题以避免使用 tf.py_func？

这是数据集的简化版本：

def read_sample(sample_path):
    masks_ds = (tf.data.Dataset.
        list_files(sample_path+"/masks/*.png")
        .map(tf.read_file)
        .map(lambda x: tf.image.decode_image(x, channels=1))
        .batch(1024)) # maximum number of objects
    masks = masks_ds.make_one_shot_iterator().get_next()

    return tf.reduce_max(masks, axis=0)

ds = tf.data.Dataset.from_tensor_slices(tf.glob("../input/stage1_train/*"))
ds.map(read_sample)
# ...
sample = ds.make_one_shot_iterator().get_next()
# ...

Run Code Online (Sandbox Code Playgroud)

Answer 1

mrr*_*rry 5

如果嵌套数据集只有一个元素，您可以tf.contrib.data.get_single_element()在嵌套数据集上使用而不是创建迭代器：

def read_sample(sample_path):
    masks_ds = (tf.data.Dataset.list_files(sample_path+"/masks/*.png")
                .map(tf.read_file)
                .map(lambda x: tf.image.decode_image(x, channels=1))
                .batch(1024)) # maximum number of objects
    masks = tf.contrib.data.get_single_element(masks_ds)
    return tf.reduce_max(masks, axis=0)

ds = tf.data.Dataset.from_tensor_slices(tf.glob("../input/stage1_train/*"))
ds = ds.map(read_sample)
sample = ds.make_one_shot_iterator().get_next()

Run Code Online (Sandbox Code Playgroud)

此外，您可以使用tf.data.Dataset.flat_map()、tf.data.Dataset.interleave()或tf.contrib.data.parallel_interleave()conversionw 在函数内执行嵌套Dataset计算，并将结果展平为单个Dataset. 例如，要获取单个中的所有样本Dataset：

def read_all_samples(sample_path):
    return (tf.data.Dataset.list_files(sample_path+"/masks/*.png")
            .map(tf.read_file)
            .map(lambda x: tf.image.decode_image(x, channels=1))
            .batch(1024)) # maximum number of objects

ds = tf.data.Dataset.from_tensor_slices(tf.glob("../input/stage1_train/*"))
ds = ds.flat_map(read_all_samples)
sample = ds.make_one_shot_iterator().get_next()

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年前
查看次数：	2359 次
最近记录：	8 年前