mar*_*ion 15 python file-io parsing text python-2.7
我试图解析一系列文本文件,并使用Python(2.7.3)将它们保存为CSV文件.所有文本文件都有一个4行长的标题,需要将其删除.数据行有各种分隔符,包括"(引用), - (破折号),"列和空格.我发现在C++中使用所有这些不同的分隔符对其进行编码很难,所以我决定在Python中试用它与C/C++相比,相对容易做到.
我编写了一段代码来测试它的单行数据并且它可以工作,但是,我无法使其适用于实际文件.为了解析单行,我使用了文本对象和"替换"方法.看起来我当前的实现将文本文件作为列表读取,并且列表对象没有替换方法.
作为Python的新手,我在这一点上陷入困境.任何输入将不胜感激!
谢谢!
# function for parsing the data
def data_parser(text, dic):
for i, j in dic.iteritems():
text = text.replace(i,j)
return text
# open input/output files
inputfile = open('test.dat')
outputfile = open('test.csv', 'w')
my_text = inputfile.readlines()[4:] #reads to whole text file, skipping first 4 lines
# sample text string, just for demonstration to let you know how the data looks like
# my_text = '"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636'
# dictionary definition 0-, 1- etc. are there to parse the date block delimited with dashes, and make sure the negative numbers are not effected
reps = {'"NAN"':'NAN', '"':'', '0-':'0,','1-':'1,','2-':'2,','3-':'3,','4-':'4,','5-':'5,','6-':'6,','7-':'7,','8-':'8,','9-':'9,', ' ':',', ':':',' }
txt = data_parser(my_text, reps)
outputfile.writelines(txt)
inputfile.close()
outputfile.close()
Run Code Online (Sandbox Code Playgroud)
Joe*_*Day 15
我会使用for循环迭代文本文件中的行:
for line in my_text:
outputfile.writelines(data_parser(line, reps))
Run Code Online (Sandbox Code Playgroud)
如果你想逐行读取文件而不是在脚本开头加载整个文件,你可以这样做:
inputfile = open('test.dat')
outputfile = open('test.csv', 'w')
# sample text string, just for demonstration to let you know how the data looks like
# my_text = '"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636'
# dictionary definition 0-, 1- etc. are there to parse the date block delimited with dashes, and make sure the negative numbers are not effected
reps = {'"NAN"':'NAN', '"':'', '0-':'0,','1-':'1,','2-':'2,','3-':'3,','4-':'4,','5-':'5,','6-':'6,','7-':'7,','8-':'8,','9-':'9,', ' ':',', ':':',' }
for i in range(4): inputfile.next() # skip first four lines
for line in inputfile:
outputfile.writelines(data_parser(line, reps))
inputfile.close()
outputfile.close()
Run Code Online (Sandbox Code Playgroud)
DSM*_*DSM 11
从接受的答案来看,看起来你想要的行为就是转向
skip 0
skip 1
skip 2
skip 3
"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636
Run Code Online (Sandbox Code Playgroud)
成
2012,06,23,03,09,13.23,4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,NAN,-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636
Run Code Online (Sandbox Code Playgroud)
如果这是正确的,那么我认为是这样的
import csv
with open("test.dat", "rb") as infile, open("test.csv", "wb") as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile, quoting=False)
for i, line in enumerate(reader):
if i < 4: continue
date = line[0].split()
day = date[0].split('-')
time = date[1].split(':')
newline = day + time + line[1:]
writer.writerow(newline)
Run Code Online (Sandbox Code Playgroud)
会比这些reps东西简单一些.
| 归档时间: |
|
| 查看次数: |
190864 次 |
| 最近记录: |