RegEx用于匹配除某些特殊字符和“ :)”之外的所有字符

mAh*_*FeR 8 python regex string regex-negation regex-lookarounds

我想从一个字符串中删除所有字符除外#@:):(。例:

this is, a placeholder text. I wanna remove symbols like ! and ? but keep @ & # & :)
Run Code Online (Sandbox Code Playgroud)

应导致(删除匹配的结果之后):

this is a placeholder text I wanna remove symbols like  and  but keep @  #  :)
Run Code Online (Sandbox Code Playgroud)

我试过了:

(?! |#|@|:\)|:\()\W
Run Code Online (Sandbox Code Playgroud)

这是工作,但在的情况下,:):(:仍然被匹配。我知道它是匹配的,因为它会检查每个字符和前面的字符,例如::)仅匹配::))匹配:)

Tim*_*sen 7

This is a tricky question, because you want to remove all symbols except for a certain whitelist. In addition, some of the symbols on the whitelist actually consist of two characters:

:)
:(
Run Code Online (Sandbox Code Playgroud)

To handle this, we can first spare both colon : and parentheses, then selectively remove either one should it not be part of a smiley or frown face:

input = "this is, a (placeholder text). I wanna remove symbols like: ! and ? but keep @ & # & :)"
output = re.sub(r'[^\w\s:()@&#]|:(?![()])|(?<!:)[()]', '', input)
print(output)

this is a placeholder text I wanna remove symbols like  and  but keep @ & # & :)
Run Code Online (Sandbox Code Playgroud)

The regex character class I used was:

[^\w\s:()@&#]
Run Code Online (Sandbox Code Playgroud)

This will match any character which is not a word or whitespace character. It also spares your whitelist from the replacement. In the other two parts of the alternation, we then override this logic, by removing colon and parentheses should they not be part of a smiley face.


FMc*_*FMc 5

正如其他人所表明的那样,编写一个正则表达式是可能的,它可以按照您提出问题的方式成功。但在这种情况下,编写正则表达式来匹配您想要保留的内容要简单得多。然后将这些部分连接在一起。

import re

rgx = re.compile(r'\w|\s|@|&|#|:\)|:\(')
orig = 'Blah!! Blah.... ### .... #@:):):) @@ Blah! Blah??? :):)#'
new = ''.join(rgx.findall(orig))
print(new)
Run Code Online (Sandbox Code Playgroud)