Python Regex - 删除特殊字符但保留撇号

Question

Python Regex - 删除特殊字符但保留撇号

我试图从一些文本中删除所有特殊字符,这是我的正则表达式:

pattern = re.compile('[\W_]+', re.UNICODE)
words = str(pattern.sub(' ', words))

Run Code Online (Sandbox Code Playgroud)

超级简单,但不幸的是,当使用撇号(单引号)时它会导致问题.例如,如果我有"不"字样,则此代码返回"doesn".

有没有办法调整这个正则表达式,以便它不会删除这样的实例中的撇号？

编辑:这是我所追求的:

doesn't this mean it -technically- works?
Run Code Online (Sandbox Code Playgroud)

应该:

这不意味着它在技术上有效

Answer 1

tob*_*xen 13

像这样？

>>> pattern=re.compile("[^\w']")
>>> pattern.sub(' ', "doesn't it rain today?")
"doesn't it rain today "

Run Code Online (Sandbox Code Playgroud)

如果下划线也应该被过滤掉:

>>> re.compile("[^\w']|_").sub(" ","doesn't this _technically_ means it works? naïve I am ...")
"doesn't this  technically  means it works  naïve I am    "

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，7 月前
查看次数：	14821 次
最近记录：	13 年，7 月前