如何使用python替换两个或多个下划线?

cel*_*owm 2 python regex replace

对于机器学习 porpuoses,我需要“清理”一些我正在提取的文本,所以我试过这个:

texto = "sdf sdf s _ sfsf sdfs _________ sfsdf"
texto = texto.replace(r"_{2,}"," ")
print(texto)
Run Code Online (Sandbox Code Playgroud)

但结果并不如预期:

sdf sdf s _ sfsf sdfs _________ sfsdf
Run Code Online (Sandbox Code Playgroud)

我想:

sdf sdf s _ sfsf sdfs  sfsdf
Run Code Online (Sandbox Code Playgroud)

Jan*_*Jan 5

你可以用

import re
texto = "sdf sdf s _ sfsf sdfs _________ sfsdf"
rx = re.compile(r'_{2,}')

texto = rx.sub('', texto)
Run Code Online (Sandbox Code Playgroud)

哪个产量

sdf sdf s _ sfsf sdfs  sfsdf
Run Code Online (Sandbox Code Playgroud)

如果您还想替换尾随空格,请将表达式更改为

rx = re.compile(r'_{2,}\s*')
Run Code Online (Sandbox Code Playgroud)

然后输出将是

sdf sdf s _ sfsf sdfs sfsdf
#                   ^^^
Run Code Online (Sandbox Code Playgroud)