Nic*_*ais 4 python dataframe pandas
我正在尝试计算两个 Pandas 列之间的 Levenshtein 距离,但我卡住了这是我正在使用的库。这是一个最小的、可重现的示例:
import pandas as pd
from textdistance import levenshtein
attempts = [['passw0rd', 'pasw0rd'],
['passwrd', 'psword'],
['psw0rd', 'passwor']]
df=pd.DataFrame(attempts, columns=['password', 'attempt'])
Run Code Online (Sandbox Code Playgroud)
password attempt
0 passw0rd pasw0rd
1 passwrd psword
2 psw0rd passwor
Run Code Online (Sandbox Code Playgroud)
我可怜的尝试:
df.apply(lambda x: levenshtein.distance(*zip(x['password'] + x['attempt'])), axis=1)
Run Code Online (Sandbox Code Playgroud)
这就是该功能的工作原理。它接受两个字符串作为参数:
levenshtein.distance('helloworld', 'heloworl')
Run Code Online (Sandbox Code Playgroud)
Out[1]: 2
Run Code Online (Sandbox Code Playgroud)
也许我遗漏了一些东西,您是否有理由不喜欢 lambda 表达式?这对我有用:
import pandas as pd
from textdistance import levenshtein
attempts = [['passw0rd', 'pasw0rd'],
['passwrd', 'psword'],
['psw0rd', 'passwor'],
['helloworld', 'heloworl']]
df=pd.DataFrame(attempts, columns=['password', 'attempt'])
df.apply(lambda x: levenshtein.distance(x['password'], x['attempt']), axis=1)
Run Code Online (Sandbox Code Playgroud)
出去:
0 1
1 3
2 4
3 2
dtype: int64
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3106 次 |
| 最近记录: |