Edu*_*rdo 2 python csv duplicates
我对以下示例中说明的问题感到困惑:
"ID","NAME","PHONE","REF","DISCARD"
1,"JOHN",12345,,
2,"PETER",6232,,
3,"JON",12345,,
4,"PETERSON",6232,,
5,"ALEX",7854,,
6,"JON",12345,,
Run Code Online (Sandbox Code Playgroud)
我想检测列"PHONE"中的重复项,并使用"REF"列标记后续重复项,其值指向第一项的"ID","DISCARD"列的值为"Yes"
"ID","NAME","PHONE","REF","DISCARD"
1,"JOHN",12345,1,
2,"PETER",6232,2,
3,"JON",12345,1,"Yes"
4,"PETERSON",6232,2,"Yes"
5,"ALEX",7854,,
6,"JON",12345,1,"Yes"
Run Code Online (Sandbox Code Playgroud)
那么,我该怎么做呢?我试过这段代码,但当然我的逻辑不正确.
import csv
myfile = open("C:\Users\Eduardo\Documents\TEST2.csv", "rb")
myfile1 = open("C:\Users\Eduardo\Documents\TEST2.csv", "rb")
dest = csv.writer(open("C:\Users\Eduardo\Documents\TESTFIXED.csv", "wb"), dialect="excel")
reader = csv.reader(myfile)
verum = list(reader)
verum.sort(key=lambda x: x[2])
for i, row in enumerate(verum):
if row[2] == verum[i][2]:
verum[i][3] = row[0]
print verum
Run Code Online (Sandbox Code Playgroud)
非常感谢您的指导和帮助.
在运行时,您必须保留在内存中的唯一内容是电话号码与其ID的映射.
map = {}
with open(r'c:\temp\input.csv', 'r') as fin:
reader = csv.reader(fin)
with open(r'c:\temp\output.csv', 'w') as fout:
writer = csv.writer(fout)
# omit this if the file has no header row
writer.writerow(next(reader))
for row in reader:
(id, name, phone, ref, discard) = row
if map.has_key(phone):
ref = map[phone]
discard = "YES"
else:
map[phone] = id
writer.writerow((id, name, phone, ref, discard))
Run Code Online (Sandbox Code Playgroud)