Tre*_*art 5 python machine-learning tensorflow tensorflow-datasets
我目前正在解决 tensorflow 中的一个问题,我需要生成批次,其中批次中的所有张量都具有特定的键值。如果可能,我正在尝试使用数据集 api。这可能吗?
过滤、映射、应用所有对单个元素的操作,我需要一种按键分组的方法。我遇到了 tf.data.experimental.group_by_window 和 tf.data.experimental.group_by_reducer,它们看起来很有希望,但我还没有找到解决方案。
最好举个例子:
dataset:
feature,label
1,word1
2,word2
3,word3
1,word1
3,word3
1,word1
1,word1
2,word2
3,word3
1,word1
3,word3
1,word1
1,word1
Run Code Online (Sandbox Code Playgroud)
按“关键”功能分组,最大批次大小 = 3,给出批次:
batch1
[[1,word1],
[1,word1],
[1,word1]]
batch2
[[1,word1],
[1,word1],
[1,word1]]
batch3
[[1,word1]]
batch4
[[2,word2]
[2,word2]]
batch5
[[3,word3],
[3,word3],
[3,word3]]
batch6
[[3,word3]]
Run Code Online (Sandbox Code Playgroud)
编辑:尽管有示例,但每批的顺序并不重要
我认为这可以实现您想要的转变:
import tensorflow as tf
import random
random.seed(100)
# Input data
label = list(range(15))
# Shuffle data
random.shuffle(label)
# Make feature from label data
feature = [lbl // 5 for lbl in label]
batch_size = 3
print('Data:')
print(*zip(feature, label), sep='\n')
with tf.Graph().as_default(), tf.Session() as sess:
# Make dataset from data arrays
ds = tf.data.Dataset.from_tensor_slices({'feature': feature, 'label': label})
# Group by window
ds = ds.apply(tf.data.experimental.group_by_window(
# Use feature as key
key_func=lambda elem: tf.to_int64(elem['feature']),
# Convert each window to a batch
reduce_func=lambda _, window: window.batch(batch_size),
# Use batch size as window size
window_size=batch_size))
# Iterator
iter = ds.make_one_shot_iterator().get_next()
# Show dataset contents
print('Result:')
while True:
try:
print(sess.run(iter))
except tf.errors.OutOfRangeError: break
Run Code Online (Sandbox Code Playgroud)
输出:
import tensorflow as tf
import random
random.seed(100)
# Input data
label = list(range(15))
# Shuffle data
random.shuffle(label)
# Make feature from label data
feature = [lbl // 5 for lbl in label]
batch_size = 3
print('Data:')
print(*zip(feature, label), sep='\n')
with tf.Graph().as_default(), tf.Session() as sess:
# Make dataset from data arrays
ds = tf.data.Dataset.from_tensor_slices({'feature': feature, 'label': label})
# Group by window
ds = ds.apply(tf.data.experimental.group_by_window(
# Use feature as key
key_func=lambda elem: tf.to_int64(elem['feature']),
# Convert each window to a batch
reduce_func=lambda _, window: window.batch(batch_size),
# Use batch size as window size
window_size=batch_size))
# Iterator
iter = ds.make_one_shot_iterator().get_next()
# Show dataset contents
print('Result:')
while True:
try:
print(sess.run(iter))
except tf.errors.OutOfRangeError: break
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1963 次 |
| 最近记录: |