如何在 pandas 结果中添加百分比计算

rbu*_*rnz 7 python dataframe python-3.x pandas

我有以下工作代码。我需要添加一个百分比列来监控变化。我不太了解如何在 pandas 中做到这一点。我需要了解哪些部分需要修改。

import pandas as pd
dl = []
with open('sampledata.txt') as f:
    for line in f:
        parts = line.split()
        # Cleaning data here.. Conversions to int/float etc,
        if not parts[3][:2].startswith('($'):
            parts.insert(3,'0')
        if len(parts) > 5:
            temp = ' '.join(parts[4:])
            parts = parts[:4] + [temp]
        parts[1] = int(parts[1])
        parts[2] = float(parts[2].replace(',', ''))
        parts[3] = float(parts[3].strip('($)'))
        dl.append(parts)
headers = ['col1', 'col2', 'col3', 'col4', 'col5']
df = pd.DataFrame(dl,columns=headers)
df = df.groupby(['col1','col5']).sum().reset_index()
df = df.sort_values('col2',ascending=False)
df['col4'] =  '($' + df['col4'].astype(str) + ')'
df = df[headers]
print(df)
Run Code Online (Sandbox Code Playgroud)

Sampledata.txt #-- 示例数据源文件

alpha   1   54,00.01                    ABC DSW2S
bravo   3   500,000.00                  ACDEF
charlie 1   27,722.29 ($250.45)         DGAS-CAS
delta   2   11 ($10)                    SWSDSASS-CCSSW
echo    5   143,299.00 ($101)           ACS34S1
lima    6   45.00181 ($38.9)            FGF5GGD-DDD
falcon  3   0.1234                      DSS2SFS3
echo    8   145,300 ($125.01)           ACS34S1
charlie 10  252,336,733.383 ($492.06)   DGAS-CAS
romeo   12  980                         ASDS SSSS SDSD
falcon  5   9.19                        DSS2SFS3
Run Code Online (Sandbox Code Playgroud)

当前输出:#--工作结果

      col1  col2          col3       col4            col5
4     echo    13  2.885990e+05  ($226.01)         ACS34S1
7    romeo    12  9.800000e+02     ($0.0)  ASDS SSSS SDSD
2  charlie    11  2.523645e+08  ($742.51)        DGAS-CAS
5   falcon     8  9.313400e+00     ($0.0)        DSS2SFS3
6     lima     6  4.500181e+01    ($38.9)     FGF5GGD-DDD
1    bravo     3  5.000000e+05     ($0.0)           ACDEF
3    delta     2  1.100000e+01    ($10.0)  SWSDSASS-CCSSW
0    alpha     1  5.400010e+03     ($0.0)       ABC DSW2S
Run Code Online (Sandbox Code Playgroud)

改进的输出:#-- 带有 % 的附加列

      col1  col2          col3       col4            col5   col6
4     echo    13  2.885990e+05  ($226.01)         ACS34S1   60%     #-- (5 + 8) = 13
7    romeo    12  9.800000e+02     ($0.0)  ASDS SSSS SDSD   0%
2  charlie    11  2.523645e+08  ($742.51)        DGAS-CAS   900%  #-- (1 + 10) = 11
5   falcon     8  9.313400e+00     ($0.0)        DSS2SFS3   66.67%  #-- (3 + 5) = 8
6     lima     6  4.500181e+01    ($38.9)     FGF5GGD-DDD   0%
1    bravo     3  5.000000e+05     ($0.0)           ACDEF   0%
3    delta     2  1.100000e+01    ($10.0)  SWSDSASS-CCSSW   0%
0    alpha     1  5.400010e+03     ($0.0)       ABC DSW2S   0%
Run Code Online (Sandbox Code Playgroud)

Eri*_*and 1

您可以在代码后面添加以下行: 函数compute_percentage() 使用列表变量dl。

def compute_percentage(row):
    vl = [float(parts[1]) for parts in dl if parts[0] == row['col1']]
    i = round(100. * (vl[-1]-vl[0])/vl[0] if vl[0] != 0 else 0, 2)
    if float(int(i)) == i:
        i = int(i)
    return str(i) + '%'

df['col6'] = df.apply(compute_percentage, axis=1)
Run Code Online (Sandbox Code Playgroud)

输出:

      col1  col2          col3       col4            col5    col6
4     echo    13  2.885990e+05  ($226.01)         ACS34S1     60%
7    romeo    12  9.800000e+02     ($0.0)  ASDS SSSS SDSD      0%
2  charlie    11  2.523645e+08  ($742.51)        DGAS-CAS    900%
5   falcon     8  9.313400e+00     ($0.0)        DSS2SFS3  66.67%
6     lima     6  4.500181e+01    ($38.9)     FGF5GGD-DDD      0%
1    bravo     3  5.000000e+05     ($0.0)           ACDEF      0%
3    delta     2  1.100000e+01    ($10.0)  SWSDSASS-CCSSW      0%
0    alpha     1  5.400010e+03     ($0.0)       ABC DSW2S      0%
Run Code Online (Sandbox Code Playgroud)