我遇到了一个有趣的问题:
file1.csv有几百行,如:
Code,DTime
1,2010-12-26 17:01
2,2010-12-26 17:07
2,2010-12-26 17:15
Run Code Online (Sandbox Code Playgroud)
file2.csv有大约1,100万行,如:
id,D,Sym,DateTime,Bid,Ask
1375022797,D,USD,2010-12-26 17:00:15,1.311400,1.311700
1375022965,D,USD,2010-12-26 17:00:56,1.311200,1.311500
1375022984,D,USD,2010-12-26 17:00:56,1.311300,1.311600
1375023013,D,USD,2010-12-26 17:01:01,1.311200,1.311500
1375023039,D,USD,2010-12-26 17:01:02,1.311100,1.311400
1375023055,D,USD,2010-12-26 17:01:03,1.311200,1.311500
1375023063,D,USD,2010-12-26 17:01:03,1.311300,1.311600
Run Code Online (Sandbox Code Playgroud)
我要做的是编写一个脚本,该文件获取file1.csv中的每个DTime值,并在file2.csv的DateTime列中找到部分匹配的第一个实例,并输出DateTime,Bid,Ask for该行.部分匹配在前16个字符上.
这两个文件都从最旧到最新排序,所以如果file1.csv中的"2010-12-26 17:01"匹配file2.csv中的4个条目,我只需要提取第一个:"2010-12-26 17: 01:01"
不确定如何继续..我尝试了字典,但值的顺序很重要,所以我不确定这是否有效.也许将file1的DTime列放入列表中,对于该列表中的每个条目,在file2中搜索DateTime?
多谢你们
如果您没有重复DTime值,这应该有效:
import csv
file1reader = csv.reader(open("file1.csv"), delimiter=",")
file2reader = csv.reader(open("file2.csv"), delimiter=",")
header1 = file1reader.next() #header
header2 = file2reader.next() #header
for Code, DTime in file1reader:
for id_, D, Sym, DateTime, Bid, Ask in file2reader:
if DateTime.startswith(DTime): # found it
print DateTime, Bid, Ask # output data
break # break and continue where we left next time
Run Code Online (Sandbox Code Playgroud)
编辑
import csv
from datetime import datetime
file1reader = csv.reader(open("file1.csv"), delimiter=",")
file2reader = csv.reader(open("file2.csv"), delimiter=",")
header1 = file1reader.next() #header
header2 = file2reader.next() #header
for Code, DTime in file1reader:
DTime = datetime.strptime(DTime, "%Y-%m-%d %H:%M")
for id_, D, Sym, DateTime, Bid, Ask in file2reader:
DateTime = datetime.strptime(DateTime, "%Y-%m-%d %H:%M:%S")
if DateTime>=DTime: # found it
print DateTime, Bid, Ask # output data
break # break and continue where we left next time
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
22177 次 |
| 最近记录: |