计算CSV文件中特定列中的重复值，并将该值返回到另一列（python2）

Question

计算CSV文件中特定列中的重复值，并将该值返回到另一列（python2）

我目前正在尝试计算CSV文件列中的重复值，并将该值返回到python中的另一个CSV列中。

例如，我的CSV文件：

KeyID    GeneralID
145258   KL456
145259   BG486
145260   HJ789
145261   KL456

Run Code Online (Sandbox Code Playgroud)

我要实现的是计算有多少数据相同，GeneralID然后将其插入新的CSV列。例如，

KeyID    Total_GeneralID
145258   2
145259   1
145260   1
145261   2

Run Code Online (Sandbox Code Playgroud)

我试图使用split方法拆分每一列，但效果不是很好。

我的代码：

case_id_list_data = []

with open(file_path_1, "rU") as g:
    for line in g:
        case_id_list_data.append(line.split('\t'))
        #print case_id_list_data[0][0] #the result is dissatisfying 
        #I'm stuck here..

Run Code Online (Sandbox Code Playgroud)

Answer 1

Ste*_*uch 3

如果您不喜欢 pandas 并且想继续使用标准库：

代码：

import csv
from collections import Counter
with open('file1', 'rU') as f:
    reader = csv.reader(f, delimiter='\t')
    header = next(reader)
    lines = [line for line in reader]
    counts = Counter([l[1] for l in lines])

new_lines = [l + [str(counts[l[1]])] for l in lines]
with open('file2', 'wb') as f:
    writer = csv.writer(f, delimiter='\t')
    writer.writerow(header + ['Total_GeneralID'])
    writer.writerows(new_lines)

Run Code Online (Sandbox Code Playgroud)

结果：

KeyID   GeneralID   Total_GeneralID
145258  KL456   2
145259  BG486   1
145260  HJ789   1
145261  KL456   2

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，4 月前
查看次数：	3767 次
最近记录：	8 年，4 月前