使用Python删除子字符串

Wen*_*SHE 53 python regex string

我已经从论坛中提取了一些信息.这是我现在拥有的原始字符串:

string = 'i think mabe 124 + <font color="black"><font face="Times New Roman">but I don\'t have a big experience it just how I see it in my eyes <font color="green"><font face="Arial">fun stuff'
Run Code Online (Sandbox Code Playgroud)

我不喜欢的是子字符串"<font color="black"><font face="Times New Roman">""<font color="green"><font face="Arial">".我确实想保留字符串的其他部分,除此之外.所以结果应该是这样的

resultString = "i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff"
Run Code Online (Sandbox Code Playgroud)

我怎么能这样做?实际上我用美丽的汤从论坛中提取上面的字符串.现在我可能更喜欢正则表达式来删除该部分.

jul*_*ria 102

import re
re.sub('<.*?>', '', string)
"i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff"
Run Code Online (Sandbox Code Playgroud)

re.sub函数采用常规表达式,并使用第二个参数替换字符串中的所有匹配项.在这种情况下,我们正在搜索所有标签('<.*?>')并将其替换为空('').

?用于在re非贪婪的搜索.

更多关于re module.

  • 这非常有帮助..谢谢。我用它删除了我项目的Twitter推文中的提及(@s)-re.sub('@。*?','',tweetText) (2认同)

Abh*_*jit 15

>>> import re
>>> st = " i think mabe 124 + <font color=\"black\"><font face=\"Times New Roman\">but I don't have a big experience it just how I see it in my eyes <font color=\"green\"><font face=\"Arial\">fun stuff"
>>> re.sub("<.*?>","",st)
" i think mabe 124 + but I don't have a big experience it just how I see it in my eyes fun stuff"
>>> 
Run Code Online (Sandbox Code Playgroud)