如何计算两个 Pandas DataFrame 列之间的 Levenshtein 距离?

Nic*_*ais 4 python dataframe pandas

我正在尝试计算两个 Pandas 列之间的 Levenshtein 距离,但我卡住了这是我正在使用的。这是一个最小的、可重现的示例:

import pandas as pd
from textdistance import levenshtein

attempts = [['passw0rd', 'pasw0rd'],
            ['passwrd', 'psword'],
            ['psw0rd', 'passwor']]

df=pd.DataFrame(attempts, columns=['password', 'attempt'])
Run Code Online (Sandbox Code Playgroud)
   password  attempt
0  passw0rd  pasw0rd
1   passwrd   psword
2    psw0rd  passwor
Run Code Online (Sandbox Code Playgroud)

我可怜的尝试:

df.apply(lambda x: levenshtein.distance(*zip(x['password'] + x['attempt'])), axis=1)
Run Code Online (Sandbox Code Playgroud)

这就是该功能的工作原理。它接受两个字符串作为参数:

levenshtein.distance('helloworld', 'heloworl')
Run Code Online (Sandbox Code Playgroud)
Out[1]: 2
Run Code Online (Sandbox Code Playgroud)

And*_*rea 8

也许我遗漏了一些东西,您是否有理由不喜欢 lambda 表达式?这对我有用:

import pandas as pd
from textdistance import levenshtein

attempts = [['passw0rd', 'pasw0rd'],
            ['passwrd', 'psword'],
            ['psw0rd', 'passwor'],
            ['helloworld', 'heloworl']]

df=pd.DataFrame(attempts, columns=['password', 'attempt'])

df.apply(lambda x: levenshtein.distance(x['password'],  x['attempt']), axis=1)
Run Code Online (Sandbox Code Playgroud)

出去:

0    1
1    3
2    4
3    2
dtype: int64
Run Code Online (Sandbox Code Playgroud)

  • 或者使用`map`:`df.assign(distance=[*map(levenshtein.distance, df.password, df.attempt)])` (2认同)