小编ash*_*086的帖子

改进 fuzzywuzzy - 匹配 2 个列表中的名称

我的要求是找到 2 个列表的匹配名称。第一个列表有 400 个名字，第二个列表有 90000 个名字。我得到了想要的结果，但过程需要超过 35 分钟。很明显，有 2 个 for 循环，因此需要 O(N*N) 次操作，这是瓶颈。我已经删除了两个列表中的重复项。可以帮忙改进一下吗。我检查了许多其他问题，但不知何故无法实现。如果您认为我只是错过了阅读一些已经存在的帖子，请指出这一点。我会尽力理解并复制这一点。

下面是我的代码

from fuzzywuzzy import fuzz
infile=open('names.txt','r')
name=infile.readline()
name_list=[]
while name:
    name_list.append(name.strip())
    name=infile.readline()

print (name_list)

infile2=open('names2.txt','r')
name2=infile2.readline()
name_list2=[]
while name2:
    name_list2.append(name2.strip())
    name2=infile2.readline()

print (name_list2)

response = {}
for name_to_find in name_list:
    for name_master in name_list2:
        if fuzz.ratio(name_to_find,name_master) > 90:
            response[name_to_find] = name_master
            break

for key, value in response.items():
    print ("Key is ->" + key + "  Value is -> " + value)

Run Code Online (Sandbox Code Playgroud)

python performance time long-integer fuzzywuzzy

ash*_*086

2019 03-04

7
推荐指数

1
解决办法

5308
查看次数

"连接不是连接"R程序出错

我试图在R中使用readLines,但我得到低于错误

orders1<-readLines(orders,2)
# Error in readLines(orders, 2) : 'con' is not a connection

Run Code Online (Sandbox Code Playgroud)

代码:

orders<-read.csv("~/orders.csv")

orders 

orders1<-readLines(orders,2)

orders1

Run Code Online (Sandbox Code Playgroud)

数据:

id,item,quantity_ordered,item_cost
1,playboy roll,1,12
1,rockstar roll,1,10
1,spider roll,1,8
1,keystone roll,1,25
1,salmon sashimi,6,3
1,tuna sashimi,6,2.5
1,edamame,1,6
2,salmon skin roll,1,8
2,playboy roll,1,12
2,unobtanium roll,1,35
2,tuna sashimi,4,2.5
2,yellowtail hand roll,1,7
4,california roll,1,4
4,cucumber roll,1,3.5
5,unagi roll,1,6.5
5,firecracker roll,1,9
5,unobtanium roll,1,35
,chicken teriaki hibachi,1,7.95
,diet coke,1,1.95

Run Code Online (Sandbox Code Playgroud)

ash*_*086

2019 09-11

3
推荐指数

1
解决办法

9377
查看次数