我是一个非常新手的Python用户,试图在.csv文件中对数据列进行求和.我发现其他答案真正帮助我开始(例如这里和这里).
但是,我的问题是我想循环我的文件以获取所有列的总和.
我的格式化数据如下所示:
z y x w v u
a 0 8 7 6 0 5
b 0 0 5 4 0 3
c 0 2 3 4 0 3
d 0 6 7 8 0 9
Run Code Online (Sandbox Code Playgroud)
或者像.csv格式一样:
,z,y,x,w,v,u
a,0,8,7,6,0,5
b,0,0,5,4,0,3
c,0,2,3,4,0,3
d,0,6,7,8,0,9
Run Code Online (Sandbox Code Playgroud)
目前,我只是想让迭代工作.我会担心以后的总结.这是我的代码:
import csv
data = file("test.csv", "r")
headerrow = data.next()
headerrow = headerrow.strip().split(",")
end = len(headerrow)
for i in range (1, end):
for row in csv.reader(data):
print row[i]
Run Code Online (Sandbox Code Playgroud)
这是我得到的:
>>>
0
0
0
0
>>>
Run Code Online (Sandbox Code Playgroud)
因此,它为每行打印索引1处的值,但不会继续通过其他索引.
我在这里错过了什么明显的东西?
更新:
按照非常有用的建议和解释,我现在有这个:
import csv
with open("test.csv") as data:
headerrow = next(data)
delim = "," if "," == headerrow[0] else " "
headerrow = filter(None, headerrow.rstrip().split(delim))
reader = csv.reader(data, delimiter=delim, skipinitialspace=True)
zipped = zip(*reader)
print zipped
strings = next(zipped)
print ([sum(map(int,col)) for col in zipped])
Run Code Online (Sandbox Code Playgroud)
这会返回一个错误:
Traceback (most recent call last):
File "C:\Users\the hexarch\Desktop\remove_total_absences_test.py", line 9, in <module>
strings = next(zipped)
TypeError: list object is not an iterator
Run Code Online (Sandbox Code Playgroud)
我不明白...?抱歉!
import csv
with open('in.csv')as f:
head = next(f)
# decide delimiter by what is in header
delim = "," if "," == head[0] else " "
# need to filter empty strings
head = filter(None, head.rstrip().split(delim))
# skipinitialspace must be set as you have two spaces delimited
reader = csv.reader(f,delimiter=delim, skipinitialspace=True)
# transpose rows
zipped = zip(*reader)
# skip first column
strings = next(zipped)
# sum each column
print([sum(map(int,col)) for col in zipped])
[0, 16, 22, 22, 0, 20]
Run Code Online (Sandbox Code Playgroud)
要创建匹配标题与字母总和的字典,您可以这样:
print(dict(zip(list(head), (sum(map(int,col)) for col in zipped))))
Run Code Online (Sandbox Code Playgroud)
哪个输出:
{'u': 20, 'w': 22, 'x': 22, 'z': 0, 'y': 16, 'v': 0}
Run Code Online (Sandbox Code Playgroud)
我使用python3进行上述所有操作,如果你使用python2替换为:
zip -> itertools.izip
filter -> itertools.izip
map -> itertools.imap
Run Code Online (Sandbox Code Playgroud)
Python 2代码:
import csv
from itertools import izip, imap, ifilter
with open('in.csv')as f:
head = next(f)
# decide delimiter by what is in header
delim = "," if "," == head[0] else " "
# need to filter empty strings
head = ifilter(None, head.rstrip().split(delim))
# skipinitialspace must be set as you have two spaces delimited
reader = csv.reader(f,delimiter=delim, skipinitialspace=True)
# transpose rows
zipped = izip(*reader)
# skip first column
strings = next(zipped)
# sum each column
print([sum(imap(int,col)) for col in zipped])
Run Code Online (Sandbox Code Playgroud)
输出:
[0, 16, 22, 22, 0, 20]
Run Code Online (Sandbox Code Playgroud)
如果你正在做很多这样的工作,那么pandas尤其是pandas.read_csv可能会有用,下面是一个非常基本的例子,一些熊猫大师可能希望添加到它:
import pandas as pd
df = pd.read_csv("in.csv")
print(df.sum())
Unnamed: 0 abcd
z 0
y 16
x 22
w 22
v 0
u 20
dtype: object
Run Code Online (Sandbox Code Playgroud)