PSK*_*PSK 4 python regex dataframe pandas
我有一个 Pandas DataFrame (df),其中一些单词包含编码替换字符。我想用字典中的替换词(翻译)替换这些词。
translations = {'gr?nn': 'gronn', 'm?nst': 'menst'}
df = pd.DataFrame(["gr?nn Y", "One gr?nn", "Y m?nst/line X"])
df.replace(translations, regex=True, inplace=True)
Run Code Online (Sandbox Code Playgroud)
但是,它似乎并未捕获所有实例。电流输出:
0
0 gronn Y
1 One gr?nn
2 Y m?nst/line X
Run Code Online (Sandbox Code Playgroud)
我是否需要指定任何正则表达式模式才能使替换也捕获字符串中的部分单词?
预期输出:
0
0 gronn Y
1 One gronn
2 Y menst/line X
Run Code Online (Sandbox Code Playgroud)
Turn your translations into regex find/replace strings:
translations = {r'(.*)gr?nn(.*)': r'\1gronn\2', r'(.*)m?nst(.*)': r'\1menst\2'}
df = pd.DataFrame(["gr?nn Y", "One gr?nn", "Y m?nst/line X"])
df.replace(translations, regex=True)
Run Code Online (Sandbox Code Playgroud)
Returns:
0
0 gronn Y
1 One gronn
2 Y menst/line X
Run Code Online (Sandbox Code Playgroud)