我试图找出一种基于相似性得分找到重复地址的方法.考虑这些重复的地址:
addr_1 = '# 3 FAIRMONT LINK SOUTH'
addr_2 = '3 FAIRMONT LINK S'
addr_3 = '5703 - 48TH AVE'
adrr_4 = '5703- 48 AVENUE'
Run Code Online (Sandbox Code Playgroud)
我正计划应用一些字符串转换来缩写长字,例如NORTH - > N,删除所有空格,逗号和短划线以及磅符号.现在,有了这个输出,我如何将addr_3与其余地址进行比较并检测类似的?相似度的百分比是安全的吗?你能为此提供一个简单的python代码吗?
addr_1 = '3FAIRMONTLINKS'
addr_2 = '3FAIRMONTLINKS'
addr_3 = '570348THAV'
adrr_4 = '570348AV'
Run Code Online (Sandbox Code Playgroud)
感恩,
爱德华多
我正在寻找具有以下两个功能的文本编辑器: - 同步滚动:您可以并排放置2个选项卡,同时滚动两个选项卡. - 键入时拼写检查(突出显示,下划线,动态拼写检查)我目前一直在使用Notepad ++,因为同步滚动的这个功能,但拼写检查支持很弱.我甚至不介意找到具有这些功能的文字处理器,因为我主要用于同步滚动是用于翻译文本,同时显示两种语言文本.
我很高兴你的建议.
我对以下示例中说明的问题感到困惑:
"ID","NAME","PHONE","REF","DISCARD"
1,"JOHN",12345,,
2,"PETER",6232,,
3,"JON",12345,,
4,"PETERSON",6232,,
5,"ALEX",7854,,
6,"JON",12345,,
Run Code Online (Sandbox Code Playgroud)
我想检测列"PHONE"中的重复项,并使用"REF"列标记后续重复项,其值指向第一项的"ID","DISCARD"列的值为"Yes"
"ID","NAME","PHONE","REF","DISCARD"
1,"JOHN",12345,1,
2,"PETER",6232,2,
3,"JON",12345,1,"Yes"
4,"PETERSON",6232,2,"Yes"
5,"ALEX",7854,,
6,"JON",12345,1,"Yes"
Run Code Online (Sandbox Code Playgroud)
那么,我该怎么做呢?我试过这段代码,但当然我的逻辑不正确.
import csv
myfile = open("C:\Users\Eduardo\Documents\TEST2.csv", "rb")
myfile1 = open("C:\Users\Eduardo\Documents\TEST2.csv", "rb")
dest = csv.writer(open("C:\Users\Eduardo\Documents\TESTFIXED.csv", "wb"), dialect="excel")
reader = csv.reader(myfile)
verum = list(reader)
verum.sort(key=lambda x: x[2])
for i, row in enumerate(verum):
if row[2] == verum[i][2]:
verum[i][3] = row[0]
print verum
Run Code Online (Sandbox Code Playgroud)
非常感谢您的指导和帮助.
偶尔,我们会在我们的网站上注册与此类似的注册.(这是电子邮件通知的记录):
New User Registration
FirstName: cowdqd
LastName: cowdqd
Company: nWJrFxUitwFMbnK
Email: qwupxt@bsesfj.com
Phone: oCFsfSHolrnx
Fax: -152
AddressLineOne: xRQgqnCOJkkoA
AddressLineTwo: obsDvktXDL
City: vqxXGZQgIplDwm
Province: AB
PostalCode: kgyabr
Country: CA
IncludedInPromotions: Yes
RequestInfo: No
Comments: x0EDw7 eecfocnmvwzu, [url=http://fvbppxzancnj.com/]fvbppxzancnj[/url], [link=http://tyflcliodtqa.com/]tyflcliodtqa[/link], http://ldklshrkpwwn.com/
RegistrationDate: February 24 2010
Run Code Online (Sandbox Code Playgroud)
我认为这是垃圾邮件,但通常电子邮件和链接甚至都无效.为什么有人想这样做?