我已经创建了 tensorflow hadoop 环境所需的所有环境https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/hadoop.md 如我的 spark 集群上的链接所示
我拿了一个示例代码来训练模型并将模型 h5py 保存到 HDFS 中,并希望重新加载模型。
下面是我正在使用的示例代码
from keras.models import Sequential
from keras.models import load_model
import numpy
import os
seed = 7
numpy.random.seed(seed)
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
X = dataset[:,0:8]
Y = dataset[:,8]
model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, Y, epochs=150, batch_size=10, verbose=0)
Run Code Online (Sandbox Code Playgroud)
现在我正在尝试将模型保存到我的本地目录,在那里我可以保存和恢复模型。
model.save("temp_model.h5") #### <----- it saved the model to local directory of my machine
del model
model = load_model("temp_model.h5") …Run Code Online (Sandbox Code Playgroud) 我正在尝试以分布式方式训练分类模型。我正在使用由雅虎开发的 TensorflowOnSpark 库。我正在使用github 链接的示例
我正在使用 mnist 以外的数据集,该数据集在 github 链接中提到的示例中使用。我使用的这个数据集在预处理后的维度如下(260000,28047),并且类(标签)的范围从 0:13 开始。
Run Code Online (Sandbox Code Playgroud)>>> import os >>> import tensorflow as tf >>> from tensorflow.python import keras >>> from tensorflow.python.keras import backend as K >>> from tensorflow.python.keras.models import Sequential, load_model, save_model >>> from tensorflow.python.keras.layers import Dense, Dropout >>> from tensorflow.python.keras.callbacks import LambdaCallback, TensorBoard >>> from tensorflow.python.saved_model import builder as saved_model_builder >>> from tensorflow.python.saved_model import tag_constants >>> from tensorflow.python.saved_model.signature_def_utils_impl import predict_signature_def >>> from tensorflowonspark import TFNode >>> from pyspark.context import SparkContext >>> …
数据框如图所示
Name Job Salary
john painter 40000
peter engineer 50000
sam plumber 30000
john doctor 500000
john driver 20000
sam carpenter 10000
peter scientist 100000
Run Code Online (Sandbox Code Playgroud)
如何按“名称”列进行分组并对每个组的“薪水”列应用标准化?
预期结果:
Name Job Salary
john painter 0.041666
peter engineer 0
sam plumber 1
john doctor 1
john driver 0
sam carpenter 0
peter scientist 1
Run Code Online (Sandbox Code Playgroud)
我尝试过以下方法
data = df.groupby('Name').transform(lambda x: (x - x.min()) / x.max()- x.min())
Run Code Online (Sandbox Code Playgroud)
然而,这会产生
Salary
0 -19999.960000
1 -50000.000000
2 -9999.333333
3 -19999.040000
4 -20000.000000
5 -10000.000000
6 -49999.500000
Run Code Online (Sandbox Code Playgroud) 当前的 Spark 数据框在一列的单元格级别具有 CSV 值,我尝试将其分解为新列。示例数据框
a_id features
1 2020 "a","b","c","d","constant1","1","0.1","aa"
2 2021 "a","b","c","d","constant2","1","0.2","ab"
3 2022 "a","b","c","d","constant3","1","0.3","ac","a","b","c","d","constant3","1.1","3.3","acx"
4 2023 "a","b","c","d","constant4","1","0.4","ad"
5 2024 "a","b","c","d","constant5","1","0.5","ae","a","b","c","d","constant5","1.2","6.3","xwy","a","b","c","d","constant5","2.2","8.3","bunr"
6 2025 "a","b","c","d","constant6","1","0.6","af"
Run Code Online (Sandbox Code Playgroud)
features 列有多个 csv 值,其中(a、b、c、d)充当标题,它们在某些单元格(第 3 行和第 5 行)中重复,我只想提取一个标题及其各自的值。预期数据帧的输出如图所示
输出火花数据帧
a_id a d
1 2020 constant1 ["aa"]
2 2021 constant2 ["ab"]
3 2022 constant3 ["ac","acx"]
4 2023 constant4 ["ad"]
5 2024 constant5 ["ae","xwy","bunr"]
6 2025 constant6 ["af"]
Run Code Online (Sandbox Code Playgroud)
如图所示,我只想提取 a 和 d 标题作为新列,其中 a 是常量,d 有多个值,其值作为列表。
请帮助如何在 pysaprk 中转换它。上面的数据帧是实时流数据帧。
我正在尝试浏览我拥有的数据,但是我发现数据中存在很多异常情况。数据框的日期列的日期为“ 12012-09-14”和“ 2500-09-28”。我想用“ 2250-05-05”代替它们。
我想在df1中保留有效日期,并将那些无效日期保留在列表中
df1:
col col2 date
1 b1a2 NaN
2 bal2 12012-09-14
3 a3l2 12017-09-14
4 a5l2 2019-09-24
5 a8l2 2012-09-28
6 a1l2 12113-09-14
7 a0l2 12012-09-24
8 a2l2 2500-09-28
9 a6l2 2500-09-14
10 a5l2 2012-09-24
Run Code Online (Sandbox Code Playgroud)
有人可以帮助我如何提取那些无效的日期吗?
预期产量:
col col2 date
0 1 b1a2 2250-05-05
1 2 bal2 2250-05-05
2 3 a3l2 2250-05-05
3 4 a5l2 2019-09-24
4 5 a8l2 2012-09-28
5 6 a1l2 2250-05-05
6 7 a0l2 2250-05-05
7 8 a2l2 2250-05-05
8 9 a6l2 2250-05-05 …Run Code Online (Sandbox Code Playgroud)