Pandas sumif有重复的列名

tro*_*rob 2 python dataframe pandas

通过下面的df3列对df2列进行求和的最佳方法是什么?

df = pd.DataFrame(np.random.rand(25).reshape((5,5)),index = ['A','B','C','D','E'])
df1 = pd.DataFrame(np.random.rand(15).reshape((5,3)),index = ['A','B','C','D','E'])
df2 = pd.concat([df,df1],axis=1)
df3 =  pd.DataFrame(np.random.rand(25).reshape((5,5)),columns = np.arange(5),index = ['A','B','C','D','E'])
Run Code Online (Sandbox Code Playgroud)

答案是df3的形状.

为清晰起见编辑:

df = pd.DataFrame(np.ones(25).reshape((5,5)),index = ['A','B','C','D','E'])
df1 = pd.DataFrame(np.ones(15).reshape((5,3))*2,index = ['A','B','C','D','E'],columns = [1,3,4])
df2 = pd.concat([df,df1],axis=1)
df3 =  pd.DataFrame(np.empty((5,5)),columns = np.arange(5),index = ['A','B','C','D','E'])
print(df2)
     0    1    2    3    4    1    3    4
A  1.0  1.0  1.0  1.0  1.0  2.0  2.0  2.0
B  1.0  1.0  1.0  1.0  1.0  2.0  2.0  2.0
C  1.0  1.0  1.0  1.0  1.0  2.0  2.0  2.0
D  1.0  1.0  1.0  1.0  1.0  2.0  2.0  2.0
E  1.0  1.0  1.0  1.0  1.0  2.0  2.0  2.0
Run Code Online (Sandbox Code Playgroud)

期望的结果是:

       0       1       2       3       4
A    1.0     3.0     1.0     3.0     3.0 
B    1.0     3.0     1.0     3.0     3.0 
C    1.0     3.0     1.0     3.0     3.0 
D    1.0     3.0     1.0     3.0     3.0 
E    1.0     3.0     1.0     3.0     3.0 
Run Code Online (Sandbox Code Playgroud)

Max*_*axU 7

您可以按列对DF进行分组:

In [57]: df2.groupby(axis=1, by=df2.columns).sum()
Out[57]:
     0    1    2    3    4
A  1.0  3.0  1.0  3.0  3.0
B  1.0  3.0  1.0  3.0  3.0
C  1.0  3.0  1.0  3.0  3.0
D  1.0  3.0  1.0  3.0  3.0
E  1.0  3.0  1.0  3.0  3.0
Run Code Online (Sandbox Code Playgroud)

您可以显式指定轴名称:

In [58]: df2.groupby(axis='columns', by=df2.columns).sum()
Out[58]:
     0    1    2    3    4
A  1.0  3.0  1.0  3.0  3.0
B  1.0  3.0  1.0  3.0  3.0
C  1.0  3.0  1.0  3.0  3.0
D  1.0  3.0  1.0  3.0  3.0
E  1.0  3.0  1.0  3.0  3.0
Run Code Online (Sandbox Code Playgroud)

@piRSquared的简短版本

df2.groupby(df2.columns, 1).sum()
Run Code Online (Sandbox Code Playgroud)

  • 如果您想赢得高尔夫比赛,则可以跳过参数名称:-)`df2.groupby(df2.columns,1).sum()`。 (2认同)