Lyn*_*ynn 6 python numpy pandas
我有一个数据集,我想将这些值解聚合到它们自己的唯一行中,并执行数据透视,按类别分组。
数据已更新
Period Date Area BB stat AA stat CC stat DD stat BB test AA test CC test DD test BB re AA re CC re BB test2 AA test2 CC test2 DD test2
8/1/2016 9/1/2016 NY 5 5 5 1 1 1 0 0 0 0 0 0 0
9/1/2016 10/1/2016 NY 6 6 6 4 4 4 0 0 0 0 0 0 0
8/1/2016 9/1/2016 CA 2 2 2 4 4 4 0 0 0 0 0 0 0
9/1/2016 10/1/2016 CA 1 1 1 -2 -2 -2 0 0 0 0 0 0 0
Run Code Online (Sandbox Code Playgroud)
期望的
Period Date Area stat test type re test2
8/1/2016 9/1/2016 NY 5 1 BB 0 0
9/1/2016 10/1/2016 NY 6 4 BB 0 0
8/1/2016 9/1/2016 NY 5 1 AA 0 0
9/1/2016 10/1/2016 NY 6 4 AA 0 0
8/1/2016 9/1/2016 NY 5 1 CC 0 0
9/1/2016 10/1/2016 NY 6 4 CC 0 0
8/1/2016 9/1/2016 NY 0 0 DD 0 0
9/1/2016 10/1/2016 NY 0 0 DD 0 0
8/1/2016 9/1/2016 CA 2 4 BB 0 0
9/1/2016 10/1/2016 CA 1 -2 BB 0 0
8/1/2016 9/1/2016 CA 2 4 AA 0 0
9/1/2016 10/1/2016 CA 1 -2 AA 0 0
8/1/2016 9/1/2016 CA 2 4 CC 0 0
9/1/2016 10/1/2016 CA 1 -2 CC 0 0
8/1/2016 9/1/2016 CA 0 0 DD 0 0
9/1/2016 10/1/2016 CA 0 0 DD 0 0
Run Code Online (Sandbox Code Playgroud)
正在做
value_vars = ["BB stat", "AA stat", "CC stat", "DD stat", "BB test",
"AA test", "CC test", "DD test", "BB re", "AA re", "CC re"]
df = df.melt(id_vars=["Period", "Date", "Area"], value_vars=value_vars)
temp_df = df.variable.str.split("_", 1, expand=True)
df["type"] = temp_df[0]
df["name"] = temp_df[1]
df = df.drop(columns=["variable"])
first_half = df.iloc[:len(df)//2]
second_half = df.iloc[len(df)//2:]
df = pd.merge(first_half, second_half, on=["Period", "Date", "Area", "type"], suffixes=("_1", "_2"))
df.rename(columns = {'value_3':'stat''value_2':'test', 'value_1':'re'}, inplace = True)
df.drop(columns=["name_1", "name_2"], inplace=True)
df = df[[ "Period", "Date", "Area", "stat", "test", "type", "re" ]]
df.sort_values(["Area", "type"], ascending=False, inplace=True)
df.to_markdown()
Run Code Online (Sandbox Code Playgroud)
以下代码无法捕获所有输出列。任何建议表示赞赏。
pd.wide_to_long(df,
stubnames=['AA', 'BB','CC','DD'],
i=['Period','Date','Area'],
j='',
sep=' ',
suffix='(test|re|stat)'
).unstack(level=-1, fill_value=0).stack(level=0).reset_index()
Run Code Online (Sandbox Code Playgroud)
输出:
Period Date Area type re stat test
0 8/1/2016 9/1/2016 CA AA 0.0 2.0 4.0
1 8/1/2016 9/1/2016 CA BB 0.0 2.0 4.0
2 8/1/2016 9/1/2016 CA CC 0.0 2.0 4.0
3 8/1/2016 9/1/2016 CA DD NaN 0.0 0.0
4 8/1/2016 9/1/2016 NY AA 0.0 5.0 1.0
5 8/1/2016 9/1/2016 NY BB 0.0 5.0 1.0
6 8/1/2016 9/1/2016 NY CC 0.0 5.0 1.0
7 8/1/2016 9/1/2016 NY DD NaN 0.0 0.0
8 9/1/2016 10/1/2016 CA AA 0.0 1.0 -2.0
9 9/1/2016 10/1/2016 CA BB 0.0 1.0 -2.0
10 9/1/2016 10/1/2016 CA CC 0.0 1.0 -2.0
11 9/1/2016 10/1/2016 CA DD NaN 0.0 0.0
12 9/1/2016 10/1/2016 NY AA 0.0 6.0 4.0
13 9/1/2016 10/1/2016 NY BB 0.0 6.0 4.0
14 9/1/2016 10/1/2016 NY CC 0.0 6.0 4.0
15 9/1/2016 10/1/2016 NY DD NaN 0.0 0.0
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
135 次 |
| 最近记录: |