小编Vam*_*ala的帖子

无法将 tensorflow 模型文件保存到 HDFS

我已经创建了 tensorflow hadoop 环境所需的所有环境https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/hadoop.md 如我的 spark 集群上的链接所示

我拿了一个示例代码来训练模型并将模型 h5py 保存到 HDFS 中,并希望重新加载模型。

下面是我正在使用的示例代码

from keras.models import Sequential
from keras.models import load_model
import numpy
import os

seed = 7
numpy.random.seed(seed)

dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")

X = dataset[:,0:8]
Y = dataset[:,8]

model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(X, Y, epochs=150, batch_size=10, verbose=0)
Run Code Online (Sandbox Code Playgroud)

现在我正在尝试将模型保存到我的本地目录,在那里我可以保存和恢复模型。

model.save("temp_model.h5") #### <----- it saved the model to local directory of my machine

del model

model = load_model("temp_model.h5") …
Run Code Online (Sandbox Code Playgroud)

python hdfs tensorflow

5
推荐指数
0
解决办法
642
查看次数

ValueError:检查目标时出错:预期dense_2具有形状(1,)但得到形状为(14,)的数组

我正在尝试以分布式方式训练分类模型。我正在使用由雅虎开发的 TensorflowOnSpark 库。我正在使用github 链接的示例

我正在使用 mnist 以外的数据集,该数据集在 github 链接中提到的示例中使用。我使用的这个数据集在预处理后的维度如下(260000,28047),并且类(标签)的范围从 0:13 开始。

>>> import os
>>> import tensorflow as tf
>>> from tensorflow.python import keras
>>> from tensorflow.python.keras import backend as K
>>> from tensorflow.python.keras.models import Sequential, load_model, save_model
>>> from tensorflow.python.keras.layers import Dense, Dropout
>>> from tensorflow.python.keras.callbacks import LambdaCallback, TensorBoard
>>> from tensorflow.python.saved_model import builder as saved_model_builder
>>> from tensorflow.python.saved_model import tag_constants
>>> from tensorflow.python.saved_model.signature_def_utils_impl import predict_signature_def
>>> from tensorflowonspark import TFNode
>>> from pyspark.context import SparkContext
>>> …
Run Code Online (Sandbox Code Playgroud)

python keras tensorflow

3
推荐指数
1
解决办法
4821
查看次数

使用基于另一列的 groupby 的最小最大标准化来标准化数据帧的列

数据框如图所示

Name     Job      Salary
john   painter    40000
peter  engineer   50000
sam     plumber   30000
john    doctor    500000
john    driver    20000
sam    carpenter  10000
peter  scientist  100000
Run Code Online (Sandbox Code Playgroud)

如何按“名称”列进行分组并对每个组的“薪水”列应用标准化?

预期结果:

Name     Job      Salary
john   painter    0.041666
peter  engineer   0
sam     plumber   1
john    doctor    1
john    driver    0
sam    carpenter  0
peter  scientist  1
Run Code Online (Sandbox Code Playgroud)

我尝试过以下方法

data = df.groupby('Name').transform(lambda x: (x - x.min()) / x.max()- x.min())
Run Code Online (Sandbox Code Playgroud)

然而,这会产生

         Salary
0 -19999.960000
1 -50000.000000
2  -9999.333333
3 -19999.040000
4 -20000.000000
5 -10000.000000
6 -49999.500000
Run Code Online (Sandbox Code Playgroud)

python dataframe pandas pandas-groupby

2
推荐指数
1
解决办法
4052
查看次数

如何将 pyspark 数据框中的单元格中的 CSV 值分别分隔为新列及其值

当前的 Spark 数据框在一列的单元格级别具有 CSV 值,我尝试将其分解为新列。示例数据框

    a_id                                    features
1   2020     "a","b","c","d","constant1","1","0.1","aa"
2   2021     "a","b","c","d","constant2","1","0.2","ab"
3   2022     "a","b","c","d","constant3","1","0.3","ac","a","b","c","d","constant3","1.1","3.3","acx"
4   2023     "a","b","c","d","constant4","1","0.4","ad"
5   2024     "a","b","c","d","constant5","1","0.5","ae","a","b","c","d","constant5","1.2","6.3","xwy","a","b","c","d","constant5","2.2","8.3","bunr"
6   2025     "a","b","c","d","constant6","1","0.6","af"
Run Code Online (Sandbox Code Playgroud)

features 列有多个 csv 值,其中(a、b、c、d)充当标题,它们在某些单元格(第 3 行和第 5 行)中重复,我只想提取一个标题及其各自的值。预期数据帧的输出如图所示

输出火花数据帧

    a_id       a        d
1   2020   constant1   ["aa"]
2   2021   constant2   ["ab"]
3   2022   constant3   ["ac","acx"]
4   2023   constant4   ["ad"]
5   2024   constant5   ["ae","xwy","bunr"]
6   2025   constant6   ["af"]
Run Code Online (Sandbox Code Playgroud)

如图所示,我只想提取 a 和 d 标题作为新列,其中 a 是常量,d 有多个值,其值作为列表。

请帮助如何在 pysaprk 中转换它。上面的数据帧是实时流数据帧。

python apache-spark pyspark

2
推荐指数
1
解决办法
166
查看次数

从数据框列中提取异常

我正在尝试浏览我拥有的数据,但是我发现数据中存在很多异常情况。数据框的日期列的日期为“ 12012-09-14”和“ 2500-09-28”。我想用“ 2250-05-05”代替它们。

我想在df1中保留有效日期,并将那些无效日期保留在列表中

df1:

col col2        date 
1   b1a2         NaN 
2   bal2  12012-09-14 
3   a3l2  12017-09-14 
4   a5l2  2019-09-24 
5   a8l2  2012-09-28 
6   a1l2  12113-09-14 
7   a0l2  12012-09-24 
8   a2l2  2500-09-28 
9   a6l2  2500-09-14 
10  a5l2  2012-09-24 
Run Code Online (Sandbox Code Playgroud)

有人可以帮助我如何提取那些无效的日期吗?

预期产量:

    col col2    date
0    1  b1a2 2250-05-05
1    2  bal2 2250-05-05
2    3  a3l2 2250-05-05
3    4  a5l2 2019-09-24
4    5  a8l2 2012-09-28
5    6  a1l2 2250-05-05
6    7  a0l2 2250-05-05
7    8  a2l2 2250-05-05
8    9  a6l2 2250-05-05 …
Run Code Online (Sandbox Code Playgroud)

python pandas data-cleaning

1
推荐指数
1
解决办法
55
查看次数