标签: tensorflow-datasets

使用 tf.pad() 填充 MNIST 数据集

如何填充 MNIST 数据集图像的大小(?,28,28,1)并将tf.pad()其放入(?,32,32,1)张量流中？

python-3.x tensorflow tensorflow-datasets

Ank*_*ava

2017 10-20

3
推荐指数

1
解决办法

1365
查看次数

TensorFlow TFRecordDataset shuffle buffer_size 行为

我不清楚tf.TFRecordDatasetbuffer_size中的参数的作用。假设我们有以下代码：

dataset = dataset.shuffle(buffer_size=10000).repeat().batch(batch_size)

Run Code Online (Sandbox Code Playgroud)

这是否意味着只会使用前 10k 个样本并永远重复，或者我会遍历整个数据集？如果不是，它到底有什么作用？这段代码又如何呢？

dataset = dataset.repeat().shuffle(buffer_size=10000).batch(batch_size)

Run Code Online (Sandbox Code Playgroud)

我注意到了这篇文章，但它没有提及任何内容buffer_size。

python tensorflow tensorflow-datasets

dem*_*ies

lucky-day

3
推荐指数

1
解决办法

4309
查看次数

如何检查点 tf.data 数据集对象？

在训练期间进行检查点时（以防崩溃等），我保存图表和参数，但不清楚如何对tf.data用于输入的新对象执行相同的操作。

有没有一种直接的方法来检查这些，以便我可以继续当前的纪元，或恢复洗牌状态（也许从种子？）

tensorflow tensorflow-datasets

mac*_*aut

lucky-day

3
推荐指数

1
解决办法

1471
查看次数

如何在tf.data.Dataset.map()中使用Keras的predict_on_batch？

我想找到一种在内部使用 Keras的predict_on_batch方法tf.data.Dataset.map()TF2.0.

假设我有一个 numpy 数据集

n_data = 10**5
my_data    = np.random.random((n_data,10,1))
my_targets = np.random.randint(0,2,(n_data,1))

data = ({'x_input':my_data}, {'target':my_targets})

Run Code Online (Sandbox Code Playgroud)

和一个tf.keras模型

x_input = Input((None,1), name = 'x_input')
RNN     = SimpleRNN(100,  name = 'RNN')(x_input)
dense   = Dense(1, name = 'target')(RNN)

my_model = Model(inputs = [x_input], outputs = [dense])
my_model.compile(optimizer='SGD', loss = 'binary_crossentropy')

Run Code Online (Sandbox Code Playgroud)

我可以创建一个dataset批处理

dataset = tf.data.Dataset.from_tensor_slices(data)
dataset = dataset.batch(10)
prediction_dataset = dataset.map(transform_predictions)

Run Code Online (Sandbox Code Playgroud)

其中transform_predictions是用户定义的函数，用于获取预测predict_on_batch

def transform_predictions(inputs, outputs):
    predictions = my_model.predict_on_batch(inputs)
    # predictions = …

Run Code Online (Sandbox Code Playgroud)

python keras tensorflow tensorflow-datasets tensorflow2.0

And*_*ers

2019 04-03

3
推荐指数

1
解决办法

3841
查看次数

如何从 tf.data 打印数据集的一个示例？

我有一个数据集tf.data。如何轻松打印（或抓取）数据集中的一个元素？

如同：

print(dataset[0])

Run Code Online (Sandbox Code Playgroud)

python tensorflow tensorflow-datasets tensorflow2.0

Nic*_*ler

lucky-day

3
推荐指数

1
解决办法

6602
查看次数

使用 TensorFlow 数据集进行验证

来自使用 Keras 进行训练和评估：

从 Dataset 对象进行训练时，不支持参数validation_split（从训练数据生成保留集），因为此功能需要能够对数据集的样本进行索引，而这通常使用 Dataset API 是不可能的。

有解决方法吗？如何仍将验证集与 TF 数据集一起使用？

python machine-learning keras tensorflow tensorflow-datasets

Luk*_*sen

2020 05-05

3
推荐指数

1
解决办法

5186
查看次数

如何将映射函数应用于 tf.Tensor

dataset = tf.data.Dataset.from_tensor_slices((images,boxes))
function_to_map = lambda x,y: func3(x,y)
fast_benchmark(dataset.map(function_to_map).batch(1).prefetch(tf.data.experimental.AUTOTUNE))

Run Code Online (Sandbox Code Playgroud)

现在我这里是 func3

def fast_benchmark(dataset, num_epochs=2):
    start_time = time.perf_counter()
    print('dataset->',dataset)
    for _ in tf.data.Dataset.range(num_epochs):
        for _,__ in dataset:
            print(_,__)
            break
            pass

Run Code Online (Sandbox Code Playgroud)

print 的输出是

tf.Tensor([b'/media/jake/mark-4tb3/input/datasets/pascal/VOCtrainval_11-May-2012/VOCdevkit/VOC2012/JPEGImages/2008_000008.jpg'], shape=(1,), dtype=string) <tf.RaggedTensor [[[52, 86, 470, 419], [157, 43, 288, 166]]]>

Run Code Online (Sandbox Code Playgroud)

我想在 func3() 中做什么
想要将图像目录更改为真实图像并运行批处理

python tensorflow tensor tensorflow-datasets tensorflow2.0

作者

lucky-day

3
推荐指数

1
解决办法

4974
查看次数

如何在 TF 2 中通过自定义函数使用 tf.data.Dataset.interleave()？

我正在使用 TF 2.2，并尝试使用 tf.data 创建管道。

以下工作正常：

def load_image(filePath, label):

    print('Loading File: {}' + filePath)
    raw_bytes = tf.io.read_file(filePath)
    image = tf.io.decode_image(raw_bytes, expand_animations = False)

    return image, label

# TrainDS Pipeline
trainDS = getDataset()
trainDS = trainDS.shuffle(size['train'])
trainDS = trainDS.map(load_image, num_parallel_calls=AUTOTUNE)

for d in trainDS:
    print('Image: {} - Label: {}'.format(d[0], d[1]))

Run Code Online (Sandbox Code Playgroud)

我想将load_image()与一起使用Dataset.interleave()。然后我尝试：

# TrainDS Pipeline
trainDS = getDataset()
trainDS = trainDS.shuffle(size['train'])
trainDS = trainDS.interleave(lambda x, y: load_image_with_label(x, y), cycle_length=4)

for d in trainDS:
    print('Image: {} - Label: …

Run Code Online (Sandbox Code Playgroud)

tensorflow tensorflow-datasets tensorflow2.0

Kle*_*ios

lucky-day

3
推荐指数

1
解决办法

2071
查看次数

如何取消批处理 Tensorflow 2.0 数据集

我有一个使用以下代码创建的数据集tf.data.Dataset：

dataset = Dataset.from_tensor_slices(corona_new)
dataset = dataset.window(WINDOW_SIZE, 1, drop_remainder=True)
dataset = dataset.flat_map(lambda x: x.batch(WINDOW_SIZE))
dataset = dataset.map(lambda x: tf.transpose(x))

for i in dataset:
    print(i.numpy())
    break

Run Code Online (Sandbox Code Playgroud)

当我运行它时，我得到以下输出（这是一批的示例）：

[[  0. 125. 111. 232. 164. 134. 235. 190.] 
 [  0.  14.  16.   7.   9.   7.   6.   8.]
 [  0. 132. 199. 158. 148. 141. 179. 174.]
 [  0.   0.   0.   2.   0.   2.   1.   2.]
 [  0.   0.   0.   0.   3.   5.   0.   0.]]

Run Code Online (Sandbox Code Playgroud)

我怎样才能取消它们？

python machine-learning keras tensorflow tensorflow-datasets

Tom*_*t45

2020 07-08

3
推荐指数

1
解决办法

5381
查看次数

模块“tensorflow_datasets.core.features”没有属性“text”

大家好，我正在使用 Tensorflow 开发情绪分析，使用一些基于亚马逊电子产品的评论。在代码中，我遇到了一个错误。我使用 tensorflow 数据集来检索一些文本，但无法检索。这是代码的一部分，包含以下错误：

tokenizer = tfds.features.text.Tokenizer()

vocabulary_set = set()
for _, reviews in train_dataset.enumerate():
review_text = reviews['data']
reviews_tokens = tokenizer.tokenize(review_text.get('review_body').numpy())
vocabulary_set.update(reviews_tokens)
vocab_size = len(vocabulary_set)
vocab_size

Run Code Online (Sandbox Code Playgroud)

我从这里得到的错误是属性错误

AttributeError                            Traceback (most recent call last)
<ipython-input-17-1c32dce13853> in <module>()
----> 1 tokenizer = tfds.features.text.Tokenizer()
AttributeError: module 'tensorflow_datasets.core.features' has no attribute 'text'

Run Code Online (Sandbox Code Playgroud)

请问我该如何解决这个错误？谢谢

python keras tensorflow tensorflow-datasets nltokenizer

Kyo*_*ogo

2020 12-23

3
推荐指数

1
解决办法

1595
查看次数