jac*_*k55 3 python sum aggregation dataframe pandas
我有如下 DataFrame:
df = pd.DataFrame({"ID" : ["1", "1", "1", "2", "2", "2", "1"],
"status" : ["ac", "not", "not", "ac", np.NaN, "ac", "oth"]})
Run Code Online (Sandbox Code Playgroud)
我需要使用如下列构建 DataFrame:
你能帮我构建如下 DF 吗?
您可以使用条件掩码替换任何不是 ac 或np.nanas Otherand 的内容groupby.value_counts,然后使用add_prefix
u = df['status'].where(df['status'].eq("ac")|df['status'].isna(),"Other")
out = (u.groupby(df['ID']).value_counts(dropna=False).unstack(fill_value=0)
.add_prefix("Number_").reset_index().rename_axis(None,axis=1))
Run Code Online (Sandbox Code Playgroud)
或者;
a = pd.Series(np.select([df['status'].eq("ac"),df['status'].isna()],
['acc',np.nan],'other'))
out = (a.groupby(df['ID']).value_counts(dropna=True).unstack(fill_value=0)
.add_prefix("Numnber_").reset_index())
Run Code Online (Sandbox Code Playgroud)
print(out)
ID Number_nan Number_Other Number_ac
0 1 0 3 1
1 2 1 0 2
Run Code Online (Sandbox Code Playgroud)
类似的逻辑,但带有@Shubham 建议的交叉表:
u = df['status'].where(df['status'].eq("ac")|df['status'].isna(),"Other")
out = (pd.crosstab(df['ID'],u.fillna("NAN"),dropna=False)
.add_prefix("Number_").rename_axis(None).reset_index())
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
45 次 |
| 最近记录: |