如何在TensorFlow中使用"group_by_window"功能

Question

如何在TensorFlow中使用"group_by_window"功能

Joh*_*aro 9 python tensorflow tensorflow-datasets

在TensorFlow的新输入管道功能集中,可以使用"group_by_window"功能将记录集合在一起.它在这里的文档中描述:

https://www.tensorflow.org/api_docs/python/tf/contrib/data/Dataset#group_by_window

我不完全理解这里用来描述函数的解释,我倾向于通过例子学习.我无法在互联网上的任何地方找到任何示例代码来实现此功能.有人可以鞭打一个准系统和这个功能的可运行的例子,以显示它是如何工作的,以及给这个功能提供什么？

Answer 1

Max*_*uyn 10

对于tensorflow版本1.9.0这是一个我可以想出的快速示例:

import tensorflow as tf
import numpy as np
components = np.arange(100).astype(np.int64)
dataset = tf.data.Dataset.from_tensor_slices(components)
dataset = dataset.apply(tf.contrib.data.group_by_window(key_func=lambda x: x%2, reduce_func=lambda _, els: els.batch(10), window_size=100)
iterator = dataset.make_one_shot_iterator()
features = iterator.get_next()
sess = tf.Session()
sess.run(features) # array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18], dtype=int64)

Run Code Online (Sandbox Code Playgroud)

第一个参数key_func将数据集中的每个元素映射到一个键.

的window_size定义是考虑到桶大小reduce_fund.

在reduce_func你收到一块window_size元素.您可以根据需要随意播放,批量或填充.

使用group_by_window功能编辑动态填充和分组更多信息:

如果你有一个tf.contrib.datasethold (sequence, sequence_length, label),序列是tf.int64的张量:

def bucketing_fn(sequence_length, buckets):
    """Given a sequence_length returns a bucket id"""
    t = tf.clip_by_value(buckets, 0, sequence_length)
    return tf.argmax(t)

def reduc_fn(key, elements, window_size):
    """Receives `window_size` elements"""
    return elements.shuffle(window_size, seed=0)
# Create buckets from 0 to 500 with an increment of 15 -> [0, 15, 30, ... , 500]
buckets = [tf.constant(num, dtype=tf.int64) for num in range(0, 500, 15)
window_size = 1000
# Bucketing
dataset = dataset.group_by_window(
        lambda x, y, z: bucketing_fn(x, buckets), 
        lambda key, x: reduc_fn(key, x, window_size), window_size)
# You could pad it in the reduc_func, but I'll do it here for clarity
# The last element of the dataset is the dynamic sentences. By giving it tf.Dimension(None) it will pad the sencentences (with 0) according to the longest sentence.
dataset = dataset.padded_batch(batch_size, padded_shapes=(
        tf.TensorShape([]), tf.TensorShape([]), tf.Dimension(None)))
dataset = dataset.repeat(num_epochs)
iterator = dataset.make_one_shot_iterator()
features = iterator.get_next()

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，5 月前
查看次数：	2868 次
最近记录：	7 年，2 月前