小编pyd*_*ner的帖子

文本规范化:Python中的文本相似性。如何规范文本拼写不匹配?

我有一个带有A列的数据框,如下所示:

Column A
Carrefour supermarket
Carrefour hypermarket
Carrefour
carrefour
Carrfour downtown
Carrfor market
Lulu
Lulu Hyper
Lulu dxb
lulu airport
k.m trading
KM Trading
KM trade
K.M.  Trading
KM.Trading
Run Code Online (Sandbox Code Playgroud)

我想从下面的“ A列”中得出:

Column A
Carrefour
Carrefour
Carrefour
Carrefour
Carrefour
Carrefour
Lulu
Lulu
Lulu
Lulu
KM Trading
KM Trading
KM Trading
KM Trading
KM Trading
Run Code Online (Sandbox Code Playgroud)

为此,我编写如下代码:

MERCHANT_NAME_DICT = {"lulu": "Lulu", "carrefour": "Carrefour",  "km": "KM Trading"}

def replace_merchant_name(row):
    """Provided a long merchant name replace it with short name."""
    processed_row = re.sub(r'\s+|\.', '', row.lower()).strip() …
Run Code Online (Sandbox Code Playgroud)

nlp python-3.x

10
推荐指数
2
解决办法
301
查看次数

标签 统计

nlp ×1

python-3.x ×1