从多个 TFRecord 文件中读取

Question

从多个 TFRecord 文件中读取

yuv*_*blr 0 tensorflow-datasets tensorflow2.0

我正在使用多个 tfRecord 文件并希望从中读取以创建数据集。我正在尝试使用来自_tensor_slices 的路径并使用该数据集进一步读取 TFRecords

（多个 tfRecords 的优点：https ://datascience.stackexchange.com/questions/16318/what-is-the-benefit-of-splitting-tfrecord-file-into-shards ）

我想知道是否有更简单且行之有效的方法来做到这一点。

file_names_dataset = tf.data.Dataset.from_tensor_slices(filenames_full)

def read(inp):
    return tf.data.TFRecordDataset(inp)

file_content = file_names.map(read)

Run Code Online (Sandbox Code Playgroud)

我的下一步是使用 tf.io.parse_single_example 解析数据集。

Answer 1

Ale*_*NON 5

该tf.data.TFRecordDataset构造已经接受列表或文件名的张量。因此，您可以直接使用文件名调用它：file_content = tf.data.TFRecordDataset(filenames_full)

从tf.io.parse_single_example文档：

通过使用 parse_example 批处理 Example protos 而不是直接使用此函数，可能会看到性能优势。

因此，我建议在将tf.io.parse_example函数映射到数据集之前对数据集进行批处理：

tf.data.TFRecordDataset(
  filenames_full
).batch(
  my_batch_size
).map(
  lambda batch: tf.io.parse_example(batch, my_features)
)

Run Code Online (Sandbox Code Playgroud)

如果你想要一个完整的例子，在这篇文章中我分享了我的输入管道（从许多 TFRecord 文件中读取）。

亲切的，亚历克西斯。

归档时间：	6 年，4 月前
查看次数：	2423 次
最近记录：	6 年，2 月前