在Tensorflow中数据集类的手册中,它显示了如何对数据进行混洗以及如何对其进行批处理.然而,每个时代人们如何改变数据并不明显.我已经尝试了下面的内容,但数据的顺序与第一个时期的顺序完全相同.有人知道如何使用数据集在时代之间进行混乱吗?
n_epochs = 2
batch_size = 3
data = tf.contrib.data.Dataset.range(12)
data = data.repeat(n_epochs)
data = data.batch(batch_size)
next_batch = data.make_one_shot_iterator().get_next()
sess = tf.Session()
for _ in range(4):
print(sess.run(next_batch))
print("new epoch")
data = data.shuffle(12)
for _ in range(4):
print(sess.run(next_batch))
Run Code Online (Sandbox Code Playgroud)
我的环境:Python 3.6,TensorFlow 1.4.
TensorFlow已加入Dataset到tf.data.
你应该谨慎对待data.shuffle.在你的代码中,数据的时代已被放入dataset你的缓冲区之前shuffle.这是两个可用于混洗数据集的示例.
洗牌所有元素
# shuffle all elements
import tensorflow as tf
n_epochs = 2
batch_size = 3
buffer_size = 5
dataset = tf.data.Dataset.range(12)
dataset = dataset.shuffle(buffer_size=buffer_size)
dataset = dataset.batch(batch_size)
dataset = dataset.repeat(n_epochs)
iterator = dataset.make_one_shot_iterator()
next_batch = iterator.get_next()
sess = tf.Session()
print("epoch 1")
for _ in range(4):
print(sess.run(next_batch))
print("epoch 2")
for _ in range(4):
print(sess.run(next_batch))
Run Code Online (Sandbox Code Playgroud)
OUTPUT:
epoch 1
[1 4 5]
[3 0 7]
[6 9 8]
[10 2 11]
epoch 2
[2 0 6]
[1 7 4]
[5 3 8]
[11 9 10]
Run Code Online (Sandbox Code Playgroud)
在批次之间进行洗牌,而不是一次性洗牌
# shuffle between batches, not shuffle in a batch
import tensorflow as tf
n_epochs = 2
batch_size = 3
buffer_size = 5
dataset = tf.data.Dataset.range(12)
dataset = dataset.batch(batch_size)
dataset = dataset.repeat(n_epochs)
dataset = dataset.shuffle(buffer_size=buffer_size)
iterator = dataset.make_one_shot_iterator()
next_batch = iterator.get_next()
sess = tf.Session()
print("epoch 1")
for _ in range(4):
print(sess.run(next_batch))
print("epoch 2")
for _ in range(4):
print(sess.run(next_batch))
Run Code Online (Sandbox Code Playgroud)
OUTPUT:
epoch 1
[0 1 2]
[6 7 8]
[3 4 5]
[6 7 8]
epoch 2
[3 4 5]
[0 1 2]
[ 9 10 11]
[ 9 10 11]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4596 次 |
| 最近记录: |