mit*_*tsi 8 python scale pandas scikit-learn data-science
我想使用来自sklearn的StandardScaler的几个方法.是否可以在我的集合的某些列/功能上使用这些方法,而不是将它们应用于整个集合.
例如,该集合是sklearn:
data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18, 92,98], 'Weight' : [68, 59,49]})
Age Name Weight
0 18 3 68
1 92 4 59
2 98 6 49
col_names = ['Name', 'Age', 'Weight']
features = data[col_names]
Run Code Online (Sandbox Code Playgroud)
我适合并改造了 StandardScaler
scaler = StandardScaler().fit(features.values)
features = scaler.transform(features.values)
scaled_features = pd.DataFrame(features, columns = col_names)
Name Age Weight
0 -1.069045 -1.411004 1.202703
1 -0.267261 0.623041 0.042954
2 1.336306 0.787964 -1.245657
Run Code Online (Sandbox Code Playgroud)
但当然名称不是浮点数而是字符串,我不想将它们标准化.我怎样才能应用data和data功能只在列fit和transform?
ayh*_*han 14
首先创建数据框的副本:
scaled_features = data.copy()
Run Code Online (Sandbox Code Playgroud)
不要在转换中包含Name列:
col_names = ['Age', 'Weight']
features = scaled_features[col_names]
scaler = StandardScaler().fit(features.values)
features = scaler.transform(features.values)
Run Code Online (Sandbox Code Playgroud)
现在,不要创建新的数据帧,而是将结果分配给这两列:
scaled_features[col_names] = features
print(scaled_features)
Age Name Weight
0 -1.411004 3 1.202703
1 0.623041 4 0.042954
2 0.787964 6 -1.245657
Run Code Online (Sandbox Code Playgroud)
聚会迟到了,但这是我首选的解决方案:
#load data
data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18, 92,98], 'Weight' : [68, 59,49]})
#list for cols to scale
cols_to_scale = ['Age','Weight']
#create and fit scaler
scaler = StandardScaler()
scaler.fit(data[cols_to_scale])
#scale selected data
data[cols_to_scale] = scaler.transform(data[cols_to_scale])
Run Code Online (Sandbox Code Playgroud)
v0.20中引入了ColumnTransformer,它将转换器应用于数组或熊猫DataFrame的指定列集。
import pandas as pd
data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18, 92,98], 'Weight' : [68, 59,49]})
col_names = ['Name', 'Age', 'Weight']
features = data[col_names]
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
ct = ColumnTransformer([
('somename', StandardScaler(), ['Age', 'Weight'])
], remainder='passthrough')
ct.fit_transform(features)
Run Code Online (Sandbox Code Playgroud)
注意:像管道一样,它也有一个简写的make_column_transformer版本,不需要命名转换器。
-1.41100443, 1.20270298, 3.
0.62304092, 0.04295368, 4.
0.78796352, -1.24565666, 6.
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
14009 次 |
| 最近记录: |