pandas:求和两行数据框而不重新排列数据框?

ale*_*e19 7 python pandas

我有一个数据框,我正在尝试对两行求和而不弄乱行的顺序。

> test = {'counts' : pd.Series([10541,4143,736,18,45690], index=['Daylight','Dawn','Other / unknown','Uncoded & errors','Total']), 'percents' : pd.Series([23.07,9.07,1.61,0.04,100], index=['Daylight','Dawn','Other / unknown','Uncoded & errors','Total'])}

> testdf = pd.DataFrame(test)

                  counts  percents
Daylight           10541     23.07
Dawn                4143      9.07
Other / unknown      736      1.61
Uncoded & errors      18      0.04
Total              45690    100.00
Run Code Online (Sandbox Code Playgroud)

我想要这个输出:

                  counts  percents
Daylight           10541     23.07
Dawn                4143      9.07
Other / unknown      754      1.65   <-- sum of 'other/unknown' and 'uncoded & errors'
Total              45690    100.00
Run Code Online (Sandbox Code Playgroud)

这是我所能得到的最接近的结果:

> sum_ = testdf.loc[['Other / unknown', 'Uncoded & errors']].sum().to_frame().transpose()

     counts   percents
0    754.00   1.65       

> sum_ = sum_.rename(index={0: 'Other / unknown'})

                counts   percents
Other / unknown 754.00   1.65   

> testdf.drop(['Other / unknown', 'Uncoded & errors'],inplace=True)
> testdf = testdf.append(sum_)

Daylight         10541  23.07
Dawn             4143   9.07
Total            45690  100
Other / unknown  754    1.65
Run Code Online (Sandbox Code Playgroud)

但这不会保留原始行的顺序

我可以通过切片数据框并在“Dawn”和“Total”之间插入 sum_ 行来插入行,但是如果行标签发生变化,或者行的顺序发生变化等,那么这将不起作用。年度小册子,因此表格设计可能每年都会变化),所以我正在努力做到这一点。

pet*_*lds 9

虽然我更喜欢 MaxU 的答案,但您也可以尝试就地求和:

testdf.loc['Other / unknown'] += testdf.loc['Uncoded & errors']
Run Code Online (Sandbox Code Playgroud)

然后按索引删除行:

testdf.drop(['Uncoded & errors'], inplace=True)

In [28]: testdf
Out[28]: 
                 counts  percents
Daylight          10541     23.07
Dawn               4143      9.07
Other / unknown     754      1.65
Total             45690    100.00
Run Code Online (Sandbox Code Playgroud)


Max*_*axU 7

使用groupby(..., sort=False).sum()

In [84]: (testdf.reset_index()
   ....:        .replace({'index': {'Uncoded & errors':'Other / unknown'}})
   ....:        .groupby('index', sort=False).sum()
   ....: )
Out[84]:
                 counts  percents
index
Daylight          10541     23.07
Dawn               4143      9.07
Other / unknown     754      1.65
Total             45690    100.00
Run Code Online (Sandbox Code Playgroud)