我有一个数据框,我正在尝试对两行求和而不弄乱行的顺序。
> test = {'counts' : pd.Series([10541,4143,736,18,45690], index=['Daylight','Dawn','Other / unknown','Uncoded & errors','Total']), 'percents' : pd.Series([23.07,9.07,1.61,0.04,100], index=['Daylight','Dawn','Other / unknown','Uncoded & errors','Total'])}
> testdf = pd.DataFrame(test)
counts percents
Daylight 10541 23.07
Dawn 4143 9.07
Other / unknown 736 1.61
Uncoded & errors 18 0.04
Total 45690 100.00
Run Code Online (Sandbox Code Playgroud)
我想要这个输出:
counts percents
Daylight 10541 23.07
Dawn 4143 9.07
Other / unknown 754 1.65 <-- sum of 'other/unknown' and 'uncoded & errors'
Total 45690 100.00
Run Code Online (Sandbox Code Playgroud)
这是我所能得到的最接近的结果:
> sum_ = testdf.loc[['Other / unknown', 'Uncoded & errors']].sum().to_frame().transpose()
counts percents
0 754.00 1.65
> sum_ = sum_.rename(index={0: 'Other / unknown'})
counts percents
Other / unknown 754.00 1.65
> testdf.drop(['Other / unknown', 'Uncoded & errors'],inplace=True)
> testdf = testdf.append(sum_)
Daylight 10541 23.07
Dawn 4143 9.07
Total 45690 100
Other / unknown 754 1.65
Run Code Online (Sandbox Code Playgroud)
但这不会保留原始行的顺序
我可以通过切片数据框并在“Dawn”和“Total”之间插入 sum_ 行来插入行,但是如果行标签发生变化,或者行的顺序发生变化等,那么这将不起作用。年度小册子,因此表格设计可能每年都会变化),所以我正在努力做到这一点。
虽然我更喜欢 MaxU 的答案,但您也可以尝试就地求和:
testdf.loc['Other / unknown'] += testdf.loc['Uncoded & errors']
Run Code Online (Sandbox Code Playgroud)
然后按索引删除行:
testdf.drop(['Uncoded & errors'], inplace=True)
In [28]: testdf
Out[28]:
counts percents
Daylight 10541 23.07
Dawn 4143 9.07
Other / unknown 754 1.65
Total 45690 100.00
Run Code Online (Sandbox Code Playgroud)
使用groupby(..., sort=False).sum():
In [84]: (testdf.reset_index()
....: .replace({'index': {'Uncoded & errors':'Other / unknown'}})
....: .groupby('index', sort=False).sum()
....: )
Out[84]:
counts percents
index
Daylight 10541 23.07
Dawn 4143 9.07
Other / unknown 754 1.65
Total 45690 100.00
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
31651 次 |
| 最近记录: |