avi*_*iss 2 python pandas pandas-groupby
这是我的数据框的一个例子:
df_lst = [
{"wordcount": 100, "Stats": 198765, "id": 34},
{"wordcount": 99, "Stats": 98765, "id": 35},
{"wordcount": 200, "Stats": 18765, "id": 36},
{"wordcount": 250, "Stats": 788765, "id": 37},
{"wordcount": 345, "Stats": 12765, "id": 38},
{"wordcount": 456, "Stats": 238765, "id": 39},
{"wordcount": 478, "Stats": 1934, "id": 40},
{"wordcount": 890, "Stats": 19845, "id": 41},
{"wordcount": 812, "Stats": 1987, "id": 42}]
df = pd.DataFrame(df_lst)
df.set_index('id', inplace=True)
df.head()
Run Code Online (Sandbox Code Playgroud)
DF:
Stats wordcount
id
34 198765 100
35 98765 99
36 18765 200
37 788765 250
38 12765 345
Run Code Online (Sandbox Code Playgroud)
我想计算Stats每个范围的平均值,wordcount步长为100,所以新的数据框看起来像这样:
Average wordcount
194567 100
23456 200
2378 300
...
Run Code Online (Sandbox Code Playgroud)
其中100表示0-100等.我开始编写多个条件,但感觉有更有效的方法来实现这一点.非常感谢您的帮助.
使用pd.cut()方法:
In [92]: bins = np.arange(0, df['wordcount'].max().round(-2) + 100, 100)
In [94]: df.groupby(pd.cut(df['wordcount'], bins=bins, labels=bins[1:]))['Stats'].mean()
Out[94]:
wordcount
100 148765.0
200 18765.0
300 788765.0
400 12765.0
500 120349.5
600 NaN
700 NaN
800 NaN
900 10916.0
Name: Stats, dtype: float64
Run Code Online (Sandbox Code Playgroud)