我正在使用 Pandas v1.1.0 通过滚动计数、求和和均值运行 group,我注意到滚动计数比滚动均值和总和慢得多。这似乎违反直觉,因为我们可以从平均值和总和中得出计数并节省时间。这是一个错误还是我错过了什么?多谢指教。
import pandas as pd
# Generate sample df
df = pd.DataFrame({'column1': range(600), 'group': 5*['l'+str(i) for i in range(120)]})
# sort by group for easy/efficient joining of new columns to df
df=df.sort_values('group',kind='mergesort').reset_index(drop=True)
# timing of groupby rolling count, sum and mean
%timeit df['mean']=df.groupby('group').rolling(3,min_periods=1)['column1'].mean().values
%timeit df['sum']=df.groupby('group').rolling(3,min_periods=1)['column1'].sum().values
%timeit df['count']=df.groupby('group').rolling(3,min_periods=1)['column1'].count().values
### Output
6.14 ms ± 812 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.61 ms ± 179 µs per loop (mean ± std. …Run Code Online (Sandbox Code Playgroud)