应用TensorFlow Transform来转换/缩放生产中的要素

概观

我按照以下指南编写了TF Records,我曾经在那里tf.Transform预处理我的功能.现在,我想部署我的模型,我需要在真实的实时数据上应用这个预处理功能.

我的方法

首先,假设我有2个功能:

features = ['amount', 'age']

Run Code Online (Sandbox Code Playgroud)

我有transform_fn来自Apache Beam,来自working_dir=gs://path-to-transform-fn/

然后我使用以下方法加载转换函数:

tf_transform_output = tft.TFTransformOutput(working_dir)

我认为在生产中服务的最简单方法是获取一系列处理过的数据,然后调用model.predict()(我使用的是Keras模型).

要做到这一点,我认为transform_raw_features()方法正是我所需要的.

但是,似乎在构建架构之后:

raw_features = {}
for k in features:
    raw_features.update({k: tf.constant(1)})

print(tf_transform_output.transform_raw_features(raw_features))

Run Code Online (Sandbox Code Playgroud)

我明白了:

AttributeError: 'Tensor' object has no attribute 'indices'

Run Code Online (Sandbox Code Playgroud)

现在,我假设发生了这种情况,因为我tf.VarLenFeature()在我定义架构时使用了preprocessing_fn.

def preprocessing_fn(inputs):
    outputs = inputs.copy()

    for _ in features:
        outputs[_] = tft.scale_to_z_score(outputs[_])

Run Code Online (Sandbox Code Playgroud)

我使用以下方法构建元数据:

RAW_DATA_FEATURE_SPEC = {}
for _ in features:
    RAW_DATA_FEATURE_SPEC[_] = tf.VarLenFeature(dtype=tf.float32)
    RAW_DATA_METADATA = dataset_metadata.DatasetMetadata(
    dataset_schema.from_feature_spec(RAW_DATA_FEATURE_SPEC))

Run Code Online (Sandbox Code Playgroud)

所以简而言之,给一本字典:

d = …

python tensorflow tensorflow-serving apache-beam tensorflow-transform

use*_*178

2019 01-24

9
推荐指数

1
解决办法

568
查看次数

Pandas 在 DateTime 多索引框架上滚动

有类似的问题，但我的datetime对象非常空间且没有排序，例如它们是随机的时间时间戳。基本上我需要的是使用rolling()但在记住组（第一个索引）的同时将它滚动到第二个索引上。

有一个非常相似的 GitHub 问题，您可能还想参与其中：https : //github.com/pandas-dev/pandas/issues/15584

重现代码：

import pandas as pd
data = {
    'id': ['A','A','A','B'],
    'time': pd.to_datetime(['2018-01-04 08:13:51.181','2018-01-04 08:13:55.181','2018-01-04 09:13:51.181', '2018-01-04 08:13:51.183']),
    'colA': [4,3,2,1],
    '30min_rolling_output': [4,7,2,1],
    '1day_rolling_output': [4,7,9,1]
}
test_df = pd.DataFrame(data=data).set_index(['id', 'time'])

Run Code Online (Sandbox Code Playgroud)

所需的输出是假设30m和1h参数。

可视化：

                            colA  30min_rolling_output  1day_rolling_output
id date                                                          
A  2018-01-04 08:13:51.181     4                     4                    4
   2018-01-04 08:13:55.181     3                     7                    7
   2018-01-04 09:13:51.181     2                     2                    9
B  2018-01-04 08:13:51.183     1                     1                    1

Run Code Online (Sandbox Code Playgroud)

python pandas

use*_*178

lucky-day

5
推荐指数

1
解决办法

1305
查看次数