Lea*_*ava 87 python sum dataframe pandas
目标
我有一个Pandas数据框,如下所示,有多列,想得到列的总数,MyColumn.
数据框 -df:
print df
X MyColumn Y Z
0 A 84 13.0 69.0
1 B 76 77.0 127.0
2 C 28 69.0 16.0
3 D 28 28.0 31.0
4 E 19 20.0 85.0
5 F 84 193.0 70.0
Run Code Online (Sandbox Code Playgroud)
我的尝试:
我试图使用groupby和得到列的总和.sum():
Total = df.groupby['MyColumn'].sum()
print Total
Run Code Online (Sandbox Code Playgroud)
这会导致以下错误:
TypeError: 'instancemethod' object has no attribute '__getitem__'
Run Code Online (Sandbox Code Playgroud)
预期产出
我原本预计输出如下:
319
Run Code Online (Sandbox Code Playgroud)
或者,我希望df使用包含总数的新row标题进行编辑TOTAL:
X MyColumn Y Z
0 A 84 13.0 69.0
1 B 76 77.0 127.0
2 C 28 69.0 16.0
3 D 28 28.0 31.0
4 E 19 20.0 85.0
5 F 84 193.0 70.0
TOTAL 319
Run Code Online (Sandbox Code Playgroud)
jez*_*ael 161
你应该使用sum:
Total = df['MyColumn'].sum()
print (Total)
319
Run Code Online (Sandbox Code Playgroud)
然后你使用locwith Series,在这种情况下,索引应该设置为你需要求和的特定列相同:
df.loc['Total'] = pd.Series(df['MyColumn'].sum(), index = ['MyColumn'])
print (df)
X MyColumn Y Z
0 A 84.0 13.0 69.0
1 B 76.0 77.0 127.0
2 C 28.0 69.0 16.0
3 D 28.0 28.0 31.0
4 E 19.0 20.0 85.0
5 F 84.0 193.0 70.0
Total NaN 319.0 NaN NaN
Run Code Online (Sandbox Code Playgroud)
因为如果你传递标量,所有行的值都将被填充:
df.loc['Total'] = df['MyColumn'].sum()
print (df)
X MyColumn Y Z
0 A 84 13.0 69.0
1 B 76 77.0 127.0
2 C 28 69.0 16.0
3 D 28 28.0 31.0
4 E 19 20.0 85.0
5 F 84 193.0 70.0
Total 319 319 319.0 319.0
Run Code Online (Sandbox Code Playgroud)
df.at['Total', 'MyColumn'] = df['MyColumn'].sum()
print (df)
X MyColumn Y Z
0 A 84.0 13.0 69.0
1 B 76.0 77.0 127.0
2 C 28.0 69.0 16.0
3 D 28.0 28.0 31.0
4 E 19.0 20.0 85.0
5 F 84.0 193.0 70.0
Total NaN 319.0 NaN NaN
Run Code Online (Sandbox Code Playgroud)
df.ix['Total', 'MyColumn'] = df['MyColumn'].sum()
print (df)
X MyColumn Y Z
0 A 84.0 13.0 69.0
1 B 76.0 77.0 127.0
2 C 28.0 69.0 16.0
3 D 28.0 28.0 31.0
4 E 19.0 20.0 85.0
5 F 84.0 193.0 70.0
Total NaN 319.0 NaN NaN
Run Code Online (Sandbox Code Playgroud)
注意:自Pandas v0.20以来,ix已被弃用.使用loc或iloc代替.
Psi*_*dom 16
您可以在这里使用的另一个选项:
df.loc["Total", "MyColumn"] = df.MyColumn.sum()
# X MyColumn Y Z
#0 A 84.0 13.0 69.0
#1 B 76.0 77.0 127.0
#2 C 28.0 69.0 16.0
#3 D 28.0 28.0 31.0
#4 E 19.0 20.0 85.0
#5 F 84.0 193.0 70.0
#Total NaN 319.0 NaN NaN
Run Code Online (Sandbox Code Playgroud)
你也可以使用append()方法:
df.append(pd.DataFrame(df.MyColumn.sum(), index = ["Total"], columns=["MyColumn"]))
Run Code Online (Sandbox Code Playgroud)
更新:
如果您需要为所有数字列附加sum ,您可以执行以下操作之一:
用于append以功能方式执行此操作(不更改原始数据框):
# select numeric columns and calculate the sums
sums = df.select_dtypes(pd.np.number).sum().rename('total')
# append sums to the data frame
df.append(sums)
# X MyColumn Y Z
#0 A 84.0 13.0 69.0
#1 B 76.0 77.0 127.0
#2 C 28.0 69.0 16.0
#3 D 28.0 28.0 31.0
#4 E 19.0 20.0 85.0
#5 F 84.0 193.0 70.0
#total NaN 319.0 400.0 398.0
Run Code Online (Sandbox Code Playgroud)
使用loc到位变异数据帧:
df.loc['total'] = df.select_dtypes(pd.np.number).sum()
df
# X MyColumn Y Z
#0 A 84.0 13.0 69.0
#1 B 76.0 77.0 127.0
#2 C 28.0 69.0 16.0
#3 D 28.0 28.0 31.0
#4 E 19.0 20.0 85.0
#5 F 84.0 193.0 70.0
#total NaN 638.0 800.0 796.0
Run Code Online (Sandbox Code Playgroud)
小智 5
与获取数据框的长度类似len(df),以下内容适用于熊猫和大火:
Total = sum(df['MyColumn'])
Run Code Online (Sandbox Code Playgroud)
或者
Total = sum(df.MyColumn)
print Total
Run Code Online (Sandbox Code Playgroud)
小智 5
对列求和有两种方法
数据集 = pd.read_csv("data.csv")
1: sum(数据集.列名)
2: 数据集['Column_Name'].sum()
如果这有任何问题请纠正我。
| 归档时间: |
|
| 查看次数: |
207423 次 |
| 最近记录: |