如何处理tensorflow与sklearn中的管道

Irv*_*rez 5 python deep-learning tensorflow

我刚刚开始研究深度学习，我正在尝试实现管道以实现更好的数据流。我有 scikit learn 背景，其中管道非常简单：

logreg = Pipeline(
[('scaler', StandardScaler()), 
 ('classifier', RandomForestClassifier(n_estimators= 50))]

Run Code Online (Sandbox Code Playgroud)

）

只需说明您的转换并在最后附加一个适合的模型即可。另一方面，使用 Tensorflow 的 tf.data 则要麻烦得多：

dataset = tf.data.Dataset.from_tensor_slices((X, y))

def preprocess(x, y):
    # Standard scaling
    x = tf.cast(x, tf.float32)
    mean, variance = tf.nn.moments(x, axes=[0])
    x = (x - mean) / tf.sqrt(variance)
    return x, y

batch_size = 32
dataset = dataset.map(preprocess).batch(batch_size)

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(16, activation='relu', input_shape=(8,)),
    tf.keras.layers.Dense(8, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

Run Code Online (Sandbox Code Playgroud)

现在，我的问题具体在于如何实现此管道（而不是模型），因为您应该不断向数据集添加方法以完成管道。我知道可以通过 keras 包装器将张量流模型组合到 sklearn 管道中：

from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

def create_model():
    model = Sequential()
    model.add(Dense(12, input_dim=8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('model', KerasClassifier(build_fn=create_model, epochs=10, batch_size=32))
])

Run Code Online (Sandbox Code Playgroud)

所以现在我想知道代码可读性、性能、优化等方面的最佳实践是什么。据我所知，Tensorflow 非常优化，所以我不知道如果我使用 sklearn api 会产生什么影响，但是我再次认为 sklearn 管道具有更好的功能，并且易于与其他机器学习模型集成。那么，我应该坚持使用 tf.data，还是将 sklearn 与 Tensorflow 结合起来是一个不错的选择？

归档时间：	2 年，10 月前
查看次数：	220 次
最近记录：	2 年，10 月前