通过反向聚合转换进行棘手的长枢轴(Pandas)

Lyn*_*ynn 6 python numpy pandas

我有一个数据集,我想将这些值解聚合到它们自己的唯一行中,并执行数据透视,按类别分组。

数据已更新

Period      Date        Area    BB stat AA stat CC stat DD stat BB test AA test CC test DD test BB re   AA re   CC re BB test2  AA test2 CC test2  DD test2                                                
8/1/2016    9/1/2016    NY      5       5       5               1       1       1               0       0       0     0          0       0         0
9/1/2016    10/1/2016   NY      6       6       6               4       4       4               0       0       0     0          0       0         0
8/1/2016    9/1/2016    CA      2       2       2               4       4       4               0       0       0     0          0       0         0
9/1/2016    10/1/2016   CA      1       1       1              -2      -2      -2               0       0       0     0          0       0         0
Run Code Online (Sandbox Code Playgroud)

期望的

Period      Date            Area    stat    test    type    re  test2
8/1/2016    9/1/2016        NY      5       1       BB      0   0
9/1/2016    10/1/2016       NY      6       4       BB      0   0
8/1/2016    9/1/2016        NY      5       1       AA      0   0   
9/1/2016    10/1/2016       NY      6       4       AA      0   0   
8/1/2016    9/1/2016        NY      5       1       CC      0   0
9/1/2016    10/1/2016       NY      6       4       CC      0   0   
8/1/2016    9/1/2016        NY      0       0       DD      0   0
9/1/2016    10/1/2016       NY      0       0       DD      0   0
8/1/2016    9/1/2016        CA      2       4       BB      0   0
9/1/2016    10/1/2016       CA      1       -2      BB      0   0
8/1/2016    9/1/2016        CA      2       4       AA      0   0
9/1/2016    10/1/2016       CA      1       -2      AA      0   0
8/1/2016    9/1/2016        CA      2       4       CC      0   0
9/1/2016    10/1/2016       CA      1       -2      CC      0   0
8/1/2016    9/1/2016        CA      0       0       DD      0   0
9/1/2016    10/1/2016       CA      0       0       DD      0   0
Run Code Online (Sandbox Code Playgroud)

正在做

value_vars = ["BB stat",    "AA stat",  "CC stat",  "DD stat",  "BB test",
"AA test",  "CC test",  "DD test",  "BB re",    "AA re",    "CC re"]
df = df.melt(id_vars=["Period", "Date", "Area"], value_vars=value_vars)


temp_df = df.variable.str.split("_", 1, expand=True)
df["type"] = temp_df[0]
df["name"] = temp_df[1]
df = df.drop(columns=["variable"])
first_half = df.iloc[:len(df)//2]
second_half = df.iloc[len(df)//2:]
df = pd.merge(first_half, second_half, on=["Period", "Date", "Area", "type"], suffixes=("_1", "_2"))


df.rename(columns = {'value_3':'stat''value_2':'test', 'value_1':'re'}, inplace = True)
df.drop(columns=["name_1", "name_2"], inplace=True)
df = df[[ "Period",     "Date",         "Area", "stat", "test", "type", "re"    ]]



df.sort_values(["Area", "type"], ascending=False, inplace=True)
df.to_markdown()
Run Code Online (Sandbox Code Playgroud)

以下代码无法捕获所有输出列。任何建议表示赞赏。

Qua*_*ang 4

尝试pd.wide_to_long

pd.wide_to_long(df, 
                stubnames=['AA', 'BB','CC','DD'],
                i=['Period','Date','Area'],
                j='',
                sep=' ',
                suffix='(test|re|stat)'
).unstack(level=-1, fill_value=0).stack(level=0).reset_index()
Run Code Online (Sandbox Code Playgroud)

输出:

      Period       Date Area type   re  stat  test
0   8/1/2016   9/1/2016   CA   AA  0.0   2.0   4.0
1   8/1/2016   9/1/2016   CA   BB  0.0   2.0   4.0
2   8/1/2016   9/1/2016   CA   CC  0.0   2.0   4.0
3   8/1/2016   9/1/2016   CA   DD  NaN   0.0   0.0
4   8/1/2016   9/1/2016   NY   AA  0.0   5.0   1.0
5   8/1/2016   9/1/2016   NY   BB  0.0   5.0   1.0
6   8/1/2016   9/1/2016   NY   CC  0.0   5.0   1.0
7   8/1/2016   9/1/2016   NY   DD  NaN   0.0   0.0
8   9/1/2016  10/1/2016   CA   AA  0.0   1.0  -2.0
9   9/1/2016  10/1/2016   CA   BB  0.0   1.0  -2.0
10  9/1/2016  10/1/2016   CA   CC  0.0   1.0  -2.0
11  9/1/2016  10/1/2016   CA   DD  NaN   0.0   0.0
12  9/1/2016  10/1/2016   NY   AA  0.0   6.0   4.0
13  9/1/2016  10/1/2016   NY   BB  0.0   6.0   4.0
14  9/1/2016  10/1/2016   NY   CC  0.0   6.0   4.0
15  9/1/2016  10/1/2016   NY   DD  NaN   0.0   0.0
Run Code Online (Sandbox Code Playgroud)