Qiy*_*iyu 6 python dataframe pandas
我知道 .agg 可以很容易地用于计算平均值。例如,如果我有一个数据框 df:
df
one two three
A 1 2 3
B 4 5 6
C 7 8 9
Run Code Online (Sandbox Code Playgroud)
我想计算每列的平均值,我可以简单地这样做:
df.agg(np.average)
one 4.0
two 5.0
three 6.0
dtype: float64
Run Code Online (Sandbox Code Playgroud)
现在,假设我只对“一”的平均值感兴趣。直觉上,我是这样写的,我期待一个数字 4:
df.agg({'one':np.average}) #or df['one'].agg(np.average)
Run Code Online (Sandbox Code Playgroud)
但是,它返回第一列,而不是 4:
one
A 1.0
B 4.0
C 7.0
Run Code Online (Sandbox Code Playgroud)
为什么?
Bat*_*man 14
有很多方法可以做到这一点,但您似乎偶然发现了唯一行不通的方法。这些都对我有用:
df["one"].agg("mean")
df.agg({"one": "mean"})
df["one"].agg(np.mean)
df.agg({"one": np.mean})
Run Code Online (Sandbox Code Playgroud)
查看源代码,似乎当您使用average它时将其转换DataFrame为 numpy array,然后mean默认情况下取行平均值。因为在基本情况下(无权重)average实际上调用mean.
看
def mean(a, axis=None, dtype=None, out=None, keepdims=np._NoValue):
kwargs = {}
if keepdims is not np._NoValue:
kwargs['keepdims'] = keepdims
if type(a) is not mu.ndarray:
try:
mean = a.mean
except AttributeError:
pass
else:
return mean(axis=axis, dtype=dtype, out=out, **kwargs)
return _methods._mean(a, axis=axis, dtype=dtype,
out=out, **kwargs)
Run Code Online (Sandbox Code Playgroud)
和
def average(a, axis=None, weights=None, returned=False):
if (type(a) not in (np.ndarray, np.matrix) and
issubclass(type(a), np.ndarray)):
warnings.warn("np.average currently does not preserve subclasses, but "
"will do so in the future to match the behavior of most "
"other numpy functions such as np.mean. In particular, "
"this means calls which returned a scalar may return a "
"0-d subclass object instead.",
FutureWarning, stacklevel=2)
if not isinstance(a, np.matrix):
a = np.asarray(a)
if weights is None:
avg = a.mean(axis)
scl = avg.dtype.type(a.size/avg.size)
else:
...
Run Code Online (Sandbox Code Playgroud)