如何在句子的熊猫列中使用自动更正

Question

如何在句子的熊猫列中使用自动更正

Cyb*_*ube 2 python string autocorrect python-3.x pandas

我有句列，对此我像这样分割

df['ColTest'] = df['ColTest'].str.lower().str.split()

Run Code Online (Sandbox Code Playgroud)

我所试图做的是通过在每个句子每个单词回路和应用autocorrect.spell（）

for i in df['ColTest']:
for j in i:
    df['ColTest'][i][j].replace(at.spell(j))

Run Code Online (Sandbox Code Playgroud)

这是扔了一个错误

AttributeError：“ float”对象没有属性“ replace”

autospell autospell

数据框的样子

ColTest
This is some test string
that might contain a finger
but this string might contain a toe
and this hass a spel error

Run Code Online (Sandbox Code Playgroud)

有在我的专栏没有数字...任何想法吗？

Answer 1

Moh*_*OUI 5

使用自动更正库，您需要遍历数据框的行，然后遍历给定行中的单词以应用该spell方法。这是一个工作示例：

from autocorrect import spell 
import pandas as pd 

df = pd.DataFrame(["and this hass a spel error"], columns=["colTest"])
df.colTest.apply(lambda x: " ".join([spell(i) for i in x.split()]))

Run Code Online (Sandbox Code Playgroud)

就像@jpp在下面的注释中建议的那样，我们可以避免使用lambda以下方法：

df["colTest"] = [' '.join([spell(i) for i in x.split()]) for x in df['colTest']]

Run Code Online (Sandbox Code Playgroud)

输入内容如下所示：

                      colTest
0  and this hass a spel error

Run Code Online (Sandbox Code Playgroud)

输出：

0    and this has a spell error
Name: colTest, dtype: object

Run Code Online (Sandbox Code Playgroud)

或者，避免使用类似以下内容的“ lambda”：`df ['colTest'] = [''.join（[spell（i）for x in i.split（）]）for df ['colTest']] (2认同)
好帖子！请注意，auto Correct.spell 现已弃用。但对于 auto Correct.Speller 也同样有效。`from auto Correct import Speller`, 'spell = Speller(lang='en')` 其余部分按原样工作。 (2认同)

归档时间：	7 年，11 月前
查看次数：	942 次
最近记录：	7 年，11 月前