将值的 StandardScaler() 作为新列添加到 DataFrame 会返回部分 NaN

Question

将值的 StandardScaler() 作为新列添加到 DataFrame 会返回部分 NaN

zin*_*rim 9 python nan pandas scikit-learn

我有一个熊猫数据帧：

df['total_price'].describe()

Run Code Online (Sandbox Code Playgroud)

返回

count    24895.000000
mean       216.377369
std        161.246931
min          0.000000
25%        109.900000
50%        174.000000
75%        273.000000
max       1355.900000
Name: total_price, dtype: float64

Run Code Online (Sandbox Code Playgroud)

当我申请preprocessing.StandardScaler()时：

x = df[['total_price']]
standard_scaler = preprocessing.StandardScaler()
x_scaled = standard_scaler.fit_transform(x)
df['new_col'] = pd.DataFrame(x_scaled)

Run Code Online (Sandbox Code Playgroud)

<y 具有标准化值的新列包含一些NaNs：

df[['total_price', 'new_col']].head()

    total_price new_col
0   241.95      0.158596
1   241.95      0.158596
2   241.95      0.158596
3   81.95      -0.833691
4   81.95      -0.833691

df[['total_price', 'new_col']].tail()

        total_price new_col
28167   264.0       NaN
28168   264.0       NaN
28176   94.0        NaN
28177   166.0       NaN
28178   166.0       NaN

Run Code Online (Sandbox Code Playgroud)

这里出了什么问题？

Answer 1

uke*_*emi 4

数据框中的索引有间隙：

Run Code Online (Sandbox Code Playgroud)

当您调用时，pd.DataFrame(x_scaled)您正在创建一个新的连续索引，因此当将其分配为原始数据帧中的列时，许多行将不匹配。您可以通过重置原始数据帧 ( ) 中的索引df.reset_index()或更新x就地 ( x.update(x_scaled)) 来解决此问题。

归档时间：	7 年，3 月前
查看次数：	1638 次
最近记录：	4 年，11 月前