如何将CSV中的多行组合成一行

Kei*_*onO 4 python csv

我已经获得了一个大型CSV文件,我需要将其删除以用于机器学习.我已经设法找到一种方法将文件拆分为我需要的2行 - 但我有一个问题.

我基本上有这样的文件结构.

 "David", "Red"
 "David", "Ford"
 "David", "Blue"
 "David", "Aspergers"
 "Steve", "Red"
 "Steve", "Vauxhall"
Run Code Online (Sandbox Code Playgroud)

我要求数据看起来更像......

"David, "Red", "Ford", "Blue", "Aspergers"
"Steve", "Red", "Vaxhaull"
Run Code Online (Sandbox Code Playgroud)

我目前有这个剥离CSV文件

import csv

cr = csv.reader(open("traits.csv","rb"), delimiter=',', lineterminator='\n')
cr.next() #skipping header line, no point in removing it as I need to standardise data manipuation.


# Print out the id of species and trait values
print 'Stripping input'
vals = [(row[1], row[4]) for row in cr]
print str(vals) + '\n'

with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(vals)
    print 'Sucessfully written to file output.csv'


#for row in cr:
#print row
Run Code Online (Sandbox Code Playgroud)

Kas*_*mvd 5

使用字典将名称作为键和列表中的其他属性存储为值:

my_dict={}
with open("traits.csv","rb") as f:
   cr = csv.reader(f, delimiter=',', lineterminator='\n')
   for row in cr:
       my_dict.setdefault(row[0].strip('" '),[]).append(row[1].strip('" '))
Run Code Online (Sandbox Code Playgroud)

结果:

print my_dict
{'Steve': ['Red', 'Vauxhall'], 'David': ['Red', 'Ford', 'Blue', 'Aspergers']}
Run Code Online (Sandbox Code Playgroud)

并写入新文件:

with open("output.csv", "wb") as f:
    writer = csv.writer(f,delimiter=',')
    for i,j in my_dict.iteritems():
        writer.writerow([i]+j)
Run Code Online (Sandbox Code Playgroud)

setdefault(key [,default])

如果key在字典中,则返回其值.如果不是,请插入值为default的值并返回default.默认默认为无.