我想csv.DictReader
从文件中推断出字段名称.文档说"如果省略fieldnames参数,csvfile第一行中的值将用作字段名." ,但在我的情况下,第一行包含标题,第二行包含名称.
我不能next(reader)
按照Python 3.2申请跳过csv.DictReader中的一行,因为在初始化阅读器时会发生字段名称分配(或者我做错了).
csvfile(从Excel 2010导出,原始来源):
CanVec v1.1.0,,,,,,,,,^M
Entity,Attributes combination,"Specification Code
Point","Specification Code
Line","Specification Code
Area",Generic Code,Theme,"GML - Entity name
Shape - File name
Point","GML - Entity name
Shape - File name
Line","GML - Entity name
Shape - File name
Area"^M
Amusement park,Amusement park,,,2260012,2260009,LX,,,LX_2260009_2^M
Auto wrecker,Auto wrecker,,,2360012,2360009,IC,,,IC_2360009_2^M
Run Code Online (Sandbox Code Playgroud)
我的代码:
f = open(entities_table,'rb')
try:
dialect = csv.Sniffer().sniff(f.read(1024))
f.seek(0)
reader = csv.DictReader(f, dialect=dialect)
print 'I think the field names are:\n%s\n' % (reader.fieldnames)
i = 0
for row in reader:
if i < 20:
print row
i = i + 1
finally:
f.close()
Run Code Online (Sandbox Code Playgroud)
目前的结果:
I think the field names are:
['CanVec v1.1.0', '', '', '', '', '', '', '', '', '']
Run Code Online (Sandbox Code Playgroud)
期望的结果:
I think the field names are:
['Entity','Attributes combination','"Specification Code Point"',...snip]
Run Code Online (Sandbox Code Playgroud)
我意识到简单地删除第一行并继续进行是有利的,但我正试图尽可能地在原地阅读数据,并尽量减少人工干预.
Woo*_*ble 15
之后f.seek(0)
,插入:
next(f)
Run Code Online (Sandbox Code Playgroud)
在初始化之前将文件指针前进到第二行DictReader
.
小智 1
我使用了 itertools 中的 islice 。我的标题位于一个大序言的最后一行。我已经通过了序言并使用 hederline 作为字段名:
with open(file, "r") as f:
'''Pass preamble'''
n = 0
for line in f.readlines():
n += 1
if 'same_field_name' in line: # line with field names was found
h = line.split(',')
break
f.close()
f = islice(open(i, "r"), n, None)
reader = csv.DictReader(f, fieldnames = h)
Run Code Online (Sandbox Code Playgroud)