我有一个看起来像这样的csv文件:
+-----+-----+-----+-----+-----+-----+-----+-----+ | AAA | bbb | ccc | DDD | eee | FFF | GGG | hhh | +-----+-----+-----+-----+-----+-----+-----+-----+ | 1 | 2 | 3 | 4 | 50 | 3 | 20 | 4 | | 2 | 1 | 3 | 5 | 24 | 2 | 23 | 5 | | 4 | 1 | 3 | 6 | 34 | 1 | 22 | 5 | | 2 | 1 | 3 | 5 | 24 | 2 | 23 | 5 | | 2 | 1 | 3 | 5 | 24 | 2 | 23 | 5 | +-----+-----+-----+-----+-----+-----+-----+-----+
...
我怎么才能在python中读取"AAA,DDD,FFF,GGG"列并跳过标题?我想要的输出是一个如下所示的元组列表:[(1,4,3,20),(2,5,2,23),(4,6,1,22)].我想稍后将这些数据写入SQL数据库.
我提到过这篇文章:用csv模块从csv文件中读取特定列?.但我不认为这对我的情况有帮助.由于我的.csv对于大量的列非常大,我希望我能告诉python我想要的列名,所以python可以逐行读取特定的列.
我意识到答案已被接受,但如果你真的想从csv文件中读取特定的命名列,你应该使用a DictReader
(如果你没有使用Pandas
它).
import csv
from StringIO import StringIO
columns = 'AAA,DDD,FFF,GGG'.split(',')
testdata ='''\
AAA,bbb,ccc,DDD,eee,FFF,GGG,hhh
1,2,3,4,50,3,20,4
2,1,3,5,24,2,23,5
4,1,3,6,34,1,22,5
2,1,3,5,24,2,23,5
2,1,3,5,24,2,23,5
'''
reader = csv.DictReader(StringIO(testdata))
desired_cols = (tuple(row[col] for col in columns) for row in reader)
Run Code Online (Sandbox Code Playgroud)
输出:
>>> list(desired_cols)
[('1', '4', '3', '20'),
('2', '5', '2', '23'),
('4', '6', '1', '22'),
('2', '5', '2', '23'),
('2', '5', '2', '23')]
Run Code Online (Sandbox Code Playgroud)
def read_csv(file, columns, type_name="Row"):
try:
row_type = namedtuple(type_name, columns)
except ValueError:
row_type = tuple
rows = iter(csv.reader(file))
header = rows.next()
mapping = [header.index(x) for x in columns]
for row in rows:
row = row_type(*[row[i] for i in mapping])
yield row
Run Code Online (Sandbox Code Playgroud)
例:
>>> import csv
>>> from collections import namedtuple
>>> from StringIO import StringIO
>>> def read_csv(file, columns, type_name="Row"):
... try:
... row_type = namedtuple(type_name, columns)
... except ValueError:
... row_type = tuple
... rows = iter(csv.reader(file))
... header = rows.next()
... mapping = [header.index(x) for x in columns]
... for row in rows:
... row = row_type(*[row[i] for i in mapping])
... yield row
...
>>> testdata = """\
... AAA,bbb,ccc,DDD,eee,FFF,GGG,hhh
... 1,2,3,4,50,3,20,4
... 2,1,3,5,24,2,23,5
... 4,1,3,6,34,1,22,5
... 2,1,3,5,24,2,23,5
... 2,1,3,5,24,2,23,5
... """
>>> testfile = StringIO(testdata)
>>> for row in read_csv(testfile, "AAA GGG DDD".split()):
... print row
...
Row(AAA='1', GGG='20', DDD='4')
Row(AAA='2', GGG='23', DDD='5')
Row(AAA='4', GGG='22', DDD='6')
Row(AAA='2', GGG='23', DDD='5')
Row(AAA='2', GGG='23', DDD='5')
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
19880 次 |
最近记录: |