如何在旧的 MinMaxScale 上重新缩放新数据库？

Question

如何在旧的 MinMaxScale 上重新缩放新数据库？

Sha*_*anN 3 python scaletransform scikit-learn

现在我一直在解决扩展新数据的问题。在我的方案中，我已经训练并测试了模型，所有 x_train 和 x_test 都使用 sklearn.MinMaxScaler() 进行了缩放。然后，应用于实时过程，我如何在训练和测试数据的相同规模下缩放新输入。步骤如下

featuresData = df[features].values # Array of all features with the length of thousands
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)

#Running model to make the final model
model.fit(X,Y)
model.predict(X_test)

#Saving to abcxyz.h5

Run Code Online (Sandbox Code Playgroud)

然后用新数据实施

#load the model abcxyz.h5
#catching new data 
#Scaling new data to put into the loaded model << I'm stucking in this step
#...

Run Code Online (Sandbox Code Playgroud)

那么如何缩放新数据进行预测，然后逆变换为最终结果呢？从我的逻辑来看，在训练模型之前，它需要以与旧缩放器相同的方式进行缩放

请帮忙！

Answer 1

Pra*_*iel 5

从你使用 scikit-learn 的方式来看，你需要已经保存了变压器：

import joblib
# ...
sc = MinMaxScaler(feature_range=(-1,1), copy=False)
featuresData = sc.fit_transform(featuresData)

joblib.dump(sc, 'sc.joblib') 

# with new data
sc = joblib.load('sc.joblib')
transformData = sc.transform(newData)
# ...

Run Code Online (Sandbox Code Playgroud)

使用 scikit-learn 的最佳方法是将转换与模型合并。这样，您只需保存包含转换管道的模型。

from sklearn import svm
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import Pipeline


clf = svm.SVC(kernel='linear')
sc = MinMaxScaler(feature_range=(-1,1), copy=False)

model = Pipeline([('scaler', sc), ('svc', clf)])

#...

Run Code Online (Sandbox Code Playgroud)

当你这样做时model.fit，首先模型会fit_transform在引擎盖下为你的缩放器做。使用model.predict，transform将涉及您的缩放器的。

归档时间：	6 年，4 月前
查看次数：	1245 次
最近记录：	6 年，4 月前