使用 df.mean() 时出现“FutureWarning：在 DataFrame 缩减中删除讨厌的列”警告

Question

使用 df.mean() 时出现“FutureWarning：在 DataFrame 缩减中删除讨厌的列”警告

我有一个看起来像这样的数据框：

   col1   col2 col3
0     1   True  abc
1     2  False  def
2     3   True  ghi

Run Code Online (Sandbox Code Playgroud)

当我运行时df.mean()，它显示一个警告：

>>> df.mean()
<stdin>:1: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.
col1    2.000000
col2    0.666667
dtype: float64

Run Code Online (Sandbox Code Playgroud)

我该如何解决这个警告？

Answer 1

小智 7

数字函数，例如mean, median, sem,skew仅支持处理数值。如果您查看列的数据类型......

\n

>>> df.dtypes\ncol1     int64\ncol2      bool\ncol3    object\ndtype: object\n

Run Code Online (Sandbox Code Playgroud)\n

...你可以看到的 dtypecol1是int64，它mean可以处理，因为它是数字。同样，的 dtypecol2是bool，Python、pandas 和 numpy 本质上将其视为整数，因此将其mean视为col2仅包含1(for True) 和0for False。

\n

col3然而，的 dtype是object，字符串的默认数据类型，它基本上是一个通用类型，用于封装pandas 无法理解的任何类型的数据。由于它不是数字，mean不知道如何处理它。abc（毕竟，您将如何计算和的平均值def？）

\n

有几种方法可以解决这个问题，但“忽略它”不是其中之一，因为正如警告所示，在 pandas 的未来版本中，此警告将成为一个错误，从而阻止您的代码运行。

\n

使用numeric_only=True。在这种情况下，这将导致mean跳过非数字 \xe2\x80\x94 的列col3：
\n
```
>>> df.mean(numeric_only=True)\ncol1    2.000000\ncol2    0.666667\ndtype: float64\n
```
Run Code Online (Sandbox Code Playgroud)\n
（注意如何col3省略）。
\n

仅选择您需要操作的列：

\n

>>> df[['col1', 'col2']].mean()\ncol1    2.000000\ncol2    0.666667\ndtype: float64\n

Run Code Online (Sandbox Code Playgroud)\n

\n

归档时间：	3 年，10 月前
查看次数：	9695 次
最近记录：	3 年，10 月前