yun*_*cat 5 python csv python-2.x
我目前正在尝试计算CSV文件列中的重复值,并将该值返回到python中的另一个CSV列中。
例如,我的CSV文件:
KeyID GeneralID
145258 KL456
145259 BG486
145260 HJ789
145261 KL456
Run Code Online (Sandbox Code Playgroud)
我要实现的是计算有多少数据相同,GeneralID
然后将其插入新的CSV列。例如,
KeyID Total_GeneralID
145258 2
145259 1
145260 1
145261 2
Run Code Online (Sandbox Code Playgroud)
我试图使用split方法拆分每一列,但效果不是很好。
我的代码:
case_id_list_data = []
with open(file_path_1, "rU") as g:
for line in g:
case_id_list_data.append(line.split('\t'))
#print case_id_list_data[0][0] #the result is dissatisfying
#I'm stuck here..
Run Code Online (Sandbox Code Playgroud)
如果您不喜欢 pandas 并且想继续使用标准库:
代码:
import csv
from collections import Counter
with open('file1', 'rU') as f:
reader = csv.reader(f, delimiter='\t')
header = next(reader)
lines = [line for line in reader]
counts = Counter([l[1] for l in lines])
new_lines = [l + [str(counts[l[1]])] for l in lines]
with open('file2', 'wb') as f:
writer = csv.writer(f, delimiter='\t')
writer.writerow(header + ['Total_GeneralID'])
writer.writerows(new_lines)
Run Code Online (Sandbox Code Playgroud)
结果:
KeyID GeneralID Total_GeneralID
145258 KL456 2
145259 BG486 1
145260 HJ789 1
145261 KL456 2
Run Code Online (Sandbox Code Playgroud)