创建你自己的 tfds.load

Question

创建你自己的 tfds.load

这就是我所拥有的

# Ratings data.
ratings = tfds.load('movie_lens/100k-ratings', split="train")
# Features of all the available movies.
movies = tfds.load('movie_lens/100k-movies', split="train")

# Select the basic features.
ratings = ratings.map(lambda x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"]
})
movies = movies.map(lambda x: x["movie_title"])

Run Code Online (Sandbox Code Playgroud)

由于我不想下载MovieLens，而是我自己的数据集。我试着通读它pandas。不幸的是， adata frame没有map(...)方法。是否有一个选项可以读取我的 .csv 文件并像这样传输它tfds.load(...)

这是我试过的

# Ratings data.
ratings = pd.read_csv('/content/drive/My Drive/Dataset/Test/ratings.csv')
movies = pd.read_csv('/content/drive/My Drive/Dataset/Test/movies.csv')

Run Code Online (Sandbox Code Playgroud)

错误

AttributeError: 'DataFrame' object has no attribute 'map'

Run Code Online (Sandbox Code Playgroud)

Answer 1

Nic*_*ais 4

您不需要自己的tfds.load，它只是返回一个tf.data.Dataset对象，您可以轻松地自己构建该对象。例如：

import pandas as pd
import tensorflow as tf

df = pd.read_csv('https://raw.githubusercontent.com/mwas'
                 'kom/seaborn-data/master/iris.csv')

ds = tf.data.Dataset.from_tensor_slices(dict(df)).\
    map(lambda x: x['sepal_width']).\
    batch(4)

next(iter(ds))

Run Code Online (Sandbox Code Playgroud)

<tf.Tensor: shape=(4,), dtype=float64, numpy=array([3.5, 3. , 3.2, 3.1])>

Run Code Online (Sandbox Code Playgroud)

来自文档：

注意：不要将 TFDS（此库）与 tf.data（用于构建高效数据管道的 TensorFlow API）混淆。TFDS 是 tf.data 的高级包装器。如果您不熟悉此 API，我们建议您首先阅读官方 tf.data 指南。

阅读更多有关tf.data.Dataset.

以下是如何在学习任务中使用它：

import pandas as pd
import tensorflow as tf

df = pd.read_csv('https://raw.githubusercontent.com/mwas'
                 'kom/seaborn-data/master/iris.csv')

labels = ['versicolor', 'setosa', 'virginica']

ds = tf.data.Dataset.from_tensor_slices((
    df.drop('species', axis=1).values, df['species'].values)).\
    shuffle(150).\
    map(lambda x, y: (x, tf.where(tf.equal(labels, y))[0])).\
    batch(4)

model = tf.keras.Sequential([
    tf.keras.layers.Dense(8, activation='relu'),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(3, activation='softmax')])

model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

history = model.fit(ds, epochs=50)

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，3 月前
查看次数：	541 次
最近记录：	5 年，3 月前