Jas*_*unn 5 python optimization file
我正在从一些类似于以下内容的文本文件中解析日期/时间/测量信息:
[Sun Jul 15 09:05:56.724 2018] *000129.32347
[Sun Jul 15 09:05:57.722 2018] *000129.32352
[Sun Jul 15 09:05:58.721 2018] *000129.32342
[Sun Jul 15 09:05:59.719 2018] *000129.32338
[Sun Jul 15 09:06:00.733 2018] *000129.32338
[Sun Jul 15 09:06:01.732 2018] *000129.32352
Run Code Online (Sandbox Code Playgroud)
结果进入输出文件,如下所示:
07-15-2018 09:05:56.724, 29.32347
07-15-2018 09:05:57.722, 29.32352
07-15-2018 09:05:58.721, 29.32342
07-15-2018 09:05:59.719, 29.32338
07-15-2018 09:06:00.733, 29.32338
07-15-2018 09:06:01.732, 29.32352
Run Code Online (Sandbox Code Playgroud)
我正在使用的代码如下所示:
import os
import datetime
with open('dq_barorun_20180715_calibtest.log', 'r') as fh, open('output.txt' , 'w') as fh2:
for line in fh:
line = line.split()
monthalpha = line[1]
month = datetime.datetime.strptime(monthalpha, '%b').strftime('%m')
day = line[2]
time = line[3]
yearbracket = line[4]
year = yearbracket[0:4]
pressfull = line[5]
press = pressfull[5:13]
timestamp = month+"-"+day+"-"+year+" "+time
fh2.write(timestamp + ", " + press + "\n")
Run Code Online (Sandbox Code Playgroud)
这段代码工作正常并完成了我的需要,但我正在尝试学习在 Python 中解析文件的更有效的方法。处理一个 100MB 的文件大约需要 30 秒,而且我有几个大小为 1-2GB 的文件。有没有更快的方法解析这个文件?
您可以声明monthsdict 不使用datetime模块,这应该会快一点。
months = {"Jan": "01", "Feb": "02", "Mar": "03", "Apr": "04", "May": "05", "Jun": "06",
"Jul": "07", "Aug": "08", "Sep": "09", "Oct": "10", "Nov": "11", "Dec": "12"}
Run Code Online (Sandbox Code Playgroud)
您还可以使用解包并使代码更加简单:
for line in fh:
_, month, day, time, year, last = line.split()
res = months[month] + "-" + day + "-" + year[:4] + " " + time + ", " + last[5:]
fh2.write(res)
Run Code Online (Sandbox Code Playgroud)
PStimeit显示速度快了大约10倍
| 归档时间: |
|
| 查看次数: |
1906 次 |
| 最近记录: |