rom*_*rom 43 python csv import file delimiter
我想导入两种CSV文件,有些使用";" 用于分隔符和其他人使用",".到目前为止,我一直在接下来的两行之间切换:
reader=csv.reader(f,delimiter=';')
Run Code Online (Sandbox Code Playgroud)
要么
reader=csv.reader(f,delimiter=',')
Run Code Online (Sandbox Code Playgroud)
是否有可能不指定分隔符并让程序检查正确的分隔符?
下面的解决方案(Blender和sharth)似乎适用于以逗号分隔的文件(使用Libroffice生成),但不适用于以分号分隔的文件(使用MS Office生成).以下是一个以分号分隔的文件的第一行:
ReleveAnnee;ReleveMois;NoOrdre;TitreRMC;AdopCSRegleVote;AdopCSAbs;AdoptCSContre;NoCELEX;ProposAnnee;ProposChrono;ProposOrigine;NoUniqueAnnee;NoUniqueType;NoUniqueChrono;PropoSplittee;Suite2LecturePE;Council PATH;Notes
1999;1;1;1999/83/EC: Council Decision of 18 January 1999 authorising the Kingdom of Denmark to apply or to continue to apply reductions in, or exemptions from, excise duties on certain mineral oils used for specific purposes, in accordance with the procedure provided for in Article 8(4) of Directive 92/81/EEC;U;;;31999D0083;1998;577;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document
1999;1;2;1999/81/EC: Council Decision of 18 January 1999 authorising the Kingdom of Spain to apply a measure derogating from Articles 2 and 28a(1) of the Sixth Directive (77/388/EEC) on the harmonisation of the laws of the Member States relating to turnover taxes;U;;;31999D0081;1998;184;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document
Run Code Online (Sandbox Code Playgroud)
Bil*_*nch 47
该csv
模块似乎建议使用csv嗅探器解决此问题.
他们给出了以下示例,我已根据您的情况进行了调整.
with open('example.csv', 'rb') as csvfile: # python 3: 'r',newline=""
dialect = csv.Sniffer().sniff(csvfile.read(1024), delimiters=";,")
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
# ... process CSV file contents here ...
Run Code Online (Sandbox Code Playgroud)
我们来试试吧.
[9:13am][wlynch@watermelon /tmp] cat example
#!/usr/bin/env python
import csv
def parse(filename):
with open(filename, 'rb') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(), delimiters=';,')
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
for line in reader:
print line
def main():
print 'Comma Version:'
parse('comma_separated.csv')
print
print 'Semicolon Version:'
parse('semicolon_separated.csv')
print
print 'An example from the question (kingdom.csv)'
parse('kingdom.csv')
if __name__ == '__main__':
main()
Run Code Online (Sandbox Code Playgroud)
我们的样本输入
[9:13am][wlynch@watermelon /tmp] cat comma_separated.csv
test,box,foo
round,the,bend
[9:13am][wlynch@watermelon /tmp] cat semicolon_separated.csv
round;the;bend
who;are;you
[9:22am][wlynch@watermelon /tmp] cat kingdom.csv
ReleveAnnee;ReleveMois;NoOrdre;TitreRMC;AdopCSRegleVote;AdopCSAbs;AdoptCSContre;NoCELEX;ProposAnnee;ProposChrono;ProposOrigine;NoUniqueAnnee;NoUniqueType;NoUniqueChrono;PropoSplittee;Suite2LecturePE;Council PATH;Notes
1999;1;1;1999/83/EC: Council Decision of 18 January 1999 authorising the Kingdom of Denmark to apply or to continue to apply reductions in, or exemptions from, excise duties on certain mineral oils used for specific purposes, in accordance with the procedure provided for in Article 8(4) of Directive 92/81/EEC;U;;;31999D0083;1998;577;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document
1999;1;2;1999/81/EC: Council Decision of 18 January 1999 authorising the Kingdom of Spain to apply a measure derogating from Articles 2 and 28a(1) of the Sixth Directive (77/388/EEC) on the harmonisation of the laws of the Member States relating to turnover taxes;U;;;31999D0081;1998;184;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document
Run Code Online (Sandbox Code Playgroud)
如果我们执行示例程序:
[9:14am][wlynch@watermelon /tmp] ./example
Comma Version:
['test', 'box', 'foo']
['round', 'the', 'bend']
Semicolon Version:
['round', 'the', 'bend']
['who', 'are', 'you']
An example from the question (kingdom.csv)
['ReleveAnnee', 'ReleveMois', 'NoOrdre', 'TitreRMC', 'AdopCSRegleVote', 'AdopCSAbs', 'AdoptCSContre', 'NoCELEX', 'ProposAnnee', 'ProposChrono', 'ProposOrigine', 'NoUniqueAnnee', 'NoUniqueType', 'NoUniqueChrono', 'PropoSplittee', 'Suite2LecturePE', 'Council PATH', 'Notes']
['1999', '1', '1', '1999/83/EC: Council Decision of 18 January 1999 authorising the Kingdom of Denmark to apply or to continue to apply reductions in, or exemptions from, excise duties on certain mineral oils used for specific purposes, in accordance with the procedure provided for in Article 8(4) of Directive 92/81/EEC', 'U', '', '', '31999D0083', '1998', '577', 'COM', 'NULL', 'CS', 'NULL', '', '', '', 'Propos* are missing on Celex document']
['1999', '1', '2', '1999/81/EC: Council Decision of 18 January 1999 authorising the Kingdom of Spain to apply a measure derogating from Articles 2 and 28a(1) of the Sixth Directive (77/388/EEC) on the harmonisation of the laws of the Member States relating to turnover taxes', 'U', '', '', '31999D0081', '1998', '184', 'COM', 'NULL', 'CS', 'NULL', '', '', '', 'Propos* are missing on Celex document']
Run Code Online (Sandbox Code Playgroud)
它也可能值得注意我使用的是什么版本的python.
[9:20am][wlynch@watermelon /tmp] python -V
Python 2.7.2
Run Code Online (Sandbox Code Playgroud)
给定一个处理两者的项目,(逗号)和| (垂直条)分隔的CSV文件,格式正确,我尝试了以下内容(如https://docs.python.org/2/library/csv.html#csv.Sniffer所示):
dialect = csv.Sniffer().sniff(csvfile.read(1024), delimiters=',|')
Run Code Online (Sandbox Code Playgroud)
但是,在| -delimited文件上,返回了"无法确定分隔符"异常.如果每条线具有相同数量的分隔符(不包括引号中可能包含的内容),推测嗅探启发式可能最有效是合理的.因此,我没有读取文件的前1024个字节,而是尝试完整地阅读前两行:
temp_lines = csvfile.readline() + '\n' + csvfile.readline()
dialect = csv.Sniffer().sniff(temp_lines, delimiters=',|')
Run Code Online (Sandbox Code Playgroud)
到目前为止,这对我来说效果很好.
为了解决这个问题,我创建了一个函数,它读取文件的第一行(标题)并检测分隔符.
def detectDelimiter(csvFile):
with open(csvFile, 'r') as myCsvfile:
header=myCsvfile.readline()
if header.find(";")!=-1:
return ";"
if header.find(",")!=-1:
return ","
#default delimiter (MS Office export)
return ";"
Run Code Online (Sandbox Code Playgroud)
如果你正在使用DictReader
你可以这样做:
#!/usr/bin/env python
import csv
def parse(filename):
with open(filename, 'rb') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(), delimiters=';,')
csvfile.seek(0)
reader = csv.DictReader(csvfile, dialect=dialect)
for line in reader:
print(line['ReleveAnnee'])
Run Code Online (Sandbox Code Playgroud)
我使用它Python 3.5
并且它以这种方式工作.