如何获得此输出？

Question

如何获得此输出？

我有一个很大的Pandas Data Frame。其中一部分如下所示：

 Rule_Name Rule_Seq_No  Condition Expression  Type   

Rule P     1            ID         19909       Action      
Rule P     1            Type       A           Condition   
Rule P     1            System     B           Condition   
Rule P     2            ID         19608       Action      
Rule P     2            Type       A           Condition  
Rule P     2            System     C           Condition   
Rule S     1            ID         19909       Action      
Rule S     1            Type       A           Condition   
Rule S     1            System     M           Condition   
Rule S     2            ID         19608       Action     
Rule S     2            Type       C           Condition   
Rule S     2            System     F           Condition

Run Code Online (Sandbox Code Playgroud)

该表包含一些带有序列号的规则。

我试着用不同的功能，例如MERGE，GROUP BY，APPLY但我没有得到期望的输出。

预期的输出应该是这样的：

 Rule_Name  Rule_Seq_No        Condition          Action  

Rule P       1            (Type=A)and(System=B)    19909   
Rule P       2            (Type=A)and(System=C)    19608   
Rule S       1            (Type=A)and(System=M)    19909   
Rule S       2            (Type=A)and(System=F)    19608

Run Code Online (Sandbox Code Playgroud)

出于同样的规则和相同的序列号以及其中TYPE就是Condition，我要合并的行。而且，这里的TYPE是ACTION，它应该显示在一个单独的列。

Answer 1

jez*_*ael 5

使用：

df1 = (df.assign(Condition = '(' + df['Condition'] + '=' + df['Expression'] + ')')
         .groupby(['Rule_Name','Rule_Seq_No','Type'])
         .agg({'Condition': 'and'.join, 'Expression':'first'})
         .unstack()
         .drop([('Condition','Action'), ('Expression','Condition')], axis=1)
         .droplevel(axis=1, level=0)
         .reset_index()
         .rename_axis(None, axis=1))
print (df1)
  Rule_Name  Rule_Seq_No              Condition Action
0    Rule P            1  (Type=A)and(System=B)  19909
1    Rule P            2  (Type=A)and(System=C)  19608
2    Rule S            1  (Type=A)and(System=M)  19909
3    Rule S            2  (Type=C)and(System=F)  19608

Run Code Online (Sandbox Code Playgroud)

说明：

连接列，Condition 并Expression与=并添加()
Aggreagate通过GroupBy.agg与join和first
重塑 DataFrame.unstack
DataFrame.drop用元组删除不必要的列，因为MultiIndex
删除的顶层MultiIndex由DataFrame.droplevel
通过DataFrame.reset_index和清除数据DataFrame.rename_axis

编辑：

较旧的熊猫版本（0.24及以下）的解决方案，其中包括Index.droplevel：

df1 = (df.assign(Condition = '(' + df['Condition'] + '=' + df['Expression'] + ')')
         .groupby(['Rule_Name','Rule_Seq_No','Type'])
         .agg({'Condition': 'and'.join, 'Expression':'first'})
         .unstack()
         .drop([('Condition','Action'), ('Expression','Condition')], axis=1))

df1.columns = df1.columns.droplevel(level=0)
df1 = df1.reset_index().rename_axis(None, axis=1)
print (df1)
  Rule_Name  Rule_Seq_No              Condition Action
0    Rule P            1  (Type=A)and(System=B)  19909
1    Rule P            2  (Type=A)and(System=C)  19608
2    Rule S            1  (Type=A)and(System=M)  19909
3    Rule S            2  (Type=C)and(System=F)  19608

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，4 月前
查看次数：	38 次
最近记录：	6 年，4 月前