如何纠正 Pandas DataFrame 中的拼写

RDJ*_*RDJ 6 python nlp pandas textblob

使用TextBlob库,可以通过首先将字符串定义为 TextBlob 对象然后使用该correct方法来改进字符串的拼写。

例子:

from textblob import TextBlob
data = TextBlob('Two raods diverrged in a yullow waod and surry I culd not travl bouth')
print (data.correct())
Two roads diverged in a yellow wood and sorry I could not travel both
Run Code Online (Sandbox Code Playgroud)

是否可以对 Pandas DataFrame 系列中的字符串执行此操作,例如:

data = [{'one': '3', 'two': 'two raods'}, 
         {'one': '7', 'two': 'diverrged in a yullow'}, 
        {'one': '8', 'two': 'waod and surry I'}, 
        {'one': '9', 'two': 'culd not travl bouth'}]
df = pd.DataFrame(data)
df

    one   two
0   3     Two raods
1   7     diverrged in a yullow
2   8     waod and surry I
3   9     culd not travl bouth
Run Code Online (Sandbox Code Playgroud)

要返回这个:

    one   two
0   3     Two roads
1   7     diverged in a yellow
2   8     wood and sorry I
3   9     could not travel both
Run Code Online (Sandbox Code Playgroud)

使用 TextBlob 或其他方法。

Ami*_*ory 2

你可以这样做:

df.two.apply(lambda txt: ''.join(textblob.TextBlob(txt).correct()))
Run Code Online (Sandbox Code Playgroud)

使用pandas.Series.apply