Nig*_*ale 5 python dataframe pandas sklearn-pandas
我正在使用 from sklearn.preprocessing import MinMaxScaler 和以下代码和数据集:
df = pd.DataFrame({
"A" : [-0.5624105,
-0.5637749,
0.2973856,
0.619784,
0.007297921,
0.8146919,
0.1082434,
-0.2311236,
-0.6945567,
-0.6807524,
-0.1017431,
0.5889628,
0.5384794,
0.3906553,
0.3843442,
0.4408366,
0.4035791,
0.05258237,
-0.4847771
],
"B" : [-0.5068743,
0.1422121,
0.6444226,
0.4959088,
-0.2260773,
0.3420533,
0.2346546,
0.1177824,
-0.7701161,
-0.7566853,
-0.5091485,
0.4509938,
0.4209853,
0.304058,
0.3753832,
0.6958977,
0.6763205,
0.05536954,
-0.9857719
]})
min_max_scaler = MinMaxScaler(feature_range=(0,255))
print(df)
df[df.columns] = min_max_scaler.fit_transform(df[df.columns])
print(df)
print(type(df))
Run Code Online (Sandbox Code Playgroud)
我想用整个数据集中的最小值和整个数据集中的最大值来缩放它,如何使用相同的代码来管理它?是否可以?
A B
0 -0.562411 -0.506874
1 -0.563775 0.142212
2 0.297386 0.644423
3 0.619784 0.495909
4 0.007298 -0.226077
5 0.814692 0.342053
6 0.108243 0.234655
7 -0.231124 0.117782
8 -0.694557 -0.770116
9 -0.680752 -0.756685
10 -0.101743 -0.509149
11 0.588963 0.450994
12 0.538479 0.420985
13 0.390655 0.304058
14 0.384344 0.375383
15 0.440837 0.695898
16 0.403579 0.676320
17 0.052582 0.055370
18 -0.484777 -0.985772
A B
0 22.327190 72.617646
1 22.096664 171.041874
2 167.596834 247.194572
3 222.068703 224.674680
4 118.584127 115.196304
5 255.000000 201.344798
6 135.639699 185.059394
7 78.300845 167.337476
8 0.000000 32.700971
9 2.332350 34.737551
10 100.160748 72.272798
11 216.861207 217.863993
12 208.331620 213.313653
13 183.355519 195.583380
14 182.289206 206.398778
15 191.834063 255.000000
16 185.539101 252.031411
17 126.235309 157.873501
18 35.443994 0.000000
Run Code Online (Sandbox Code Playgroud)
我不希望每一列有不同的映射,我需要使用 -0.985772 0.814692 来映射它(b 列第 18 行,a 列第 5 行)
您有两种方法可以做到这一点:
# Manually:
min_value, max_value = df.min().min(), df.max().max()
scaled1 = (df - min_value) * 255 / (max_value - min_value)
# Using MinMaxScaler
min_max_scaler = MinMaxScaler(feature_range=(0,255))
# Stack everything into a single column to scale by the global min / max
tmp = df.to_numpy().reshape(-1,1)
scaled2 = min_max_scaler.fit_transform(tmp).reshape(len(df), 2)
Run Code Online (Sandbox Code Playgroud)
两者返回相同的结果:
np.isclose(scaled1, scaled2).all()
# True
Run Code Online (Sandbox Code Playgroud)
您可以使用缩放后的值创建一个新的 DataFrame:
scaled = pd.DataFrame(scaled1, index=df.index, columns=df.columns)
Run Code Online (Sandbox Code Playgroud)
或者将它们分配回df:
df.loc[:] = scaled1
Run Code Online (Sandbox Code Playgroud)