cel*_*owm 2 python regex replace
对于机器学习 porpuoses,我需要“清理”一些我正在提取的文本,所以我试过这个:
texto = "sdf sdf s _ sfsf sdfs _________ sfsdf"
texto = texto.replace(r"_{2,}"," ")
print(texto)
Run Code Online (Sandbox Code Playgroud)
但结果并不如预期:
sdf sdf s _ sfsf sdfs _________ sfsdf
Run Code Online (Sandbox Code Playgroud)
我想:
sdf sdf s _ sfsf sdfs sfsdf
Run Code Online (Sandbox Code Playgroud)
你可以用
import re
texto = "sdf sdf s _ sfsf sdfs _________ sfsdf"
rx = re.compile(r'_{2,}')
texto = rx.sub('', texto)
Run Code Online (Sandbox Code Playgroud)
哪个产量
sdf sdf s _ sfsf sdfs sfsdf
Run Code Online (Sandbox Code Playgroud)
如果您还想替换尾随空格,请将表达式更改为
rx = re.compile(r'_{2,}\s*')
Run Code Online (Sandbox Code Playgroud)
然后输出将是
sdf sdf s _ sfsf sdfs sfsdf
# ^^^
Run Code Online (Sandbox Code Playgroud)