所以我有一个测试字符串
content = 'I opened my mouth, "Good morning!" I said cheerfully'
Run Code Online (Sandbox Code Playgroud)
我想使用正则表达式来删除双重语音标记之间的文本,而不是语音标记本身.所以它会回来
'I opened my mouth, "" I said cheerfully'
Run Code Online (Sandbox Code Playgroud)
我使用以下代码
content = re.sub(r'".*"'," ",content)
Run Code Online (Sandbox Code Playgroud)
但这也消除了双重语音标记.我应该使用什么模式来保留语音标记,但删除其中的文本.
fal*_*tru 10
使用'""'作为替换字符串:
>>> content = 'I opened my mouth, "Good morning!" I said cheerfully'
>>> content = re.sub(r'".*"', '""', content)
>>> print(content)
I opened my mouth, "" I said cheerfully
Run Code Online (Sandbox Code Playgroud)
BTW,.*尽可能匹配(贪婪).要匹配非贪婪的时尚,请使用.*?或 [^"]*.
>>> content = 'I opened my mouth, "Good morning!" I said cheerfully. "How is everyone?"'
>>> content = re.sub(r'".*?"', '""', content)
>>> print(content)
I opened my mouth, "" I said cheerfully. ""
Run Code Online (Sandbox Code Playgroud)
你也可以使用lookarounds:
(?<=")([^"]+)(?=")
Run Code Online (Sandbox Code Playgroud)

content = re.sub(r'(?<=")([^"]+)(?=")', '', content)
Run Code Online (Sandbox Code Playgroud)
两个笔记:
.*将捕获所有内容,直到字符串中的最后一个双引号,而不是下一个.这就是我成功的原因[^"]+.重要的是,当两个双引号子字符串在整个字符串中时,这将不起作用,除非您增加下一个搜索开始的索引.所以,例如,用
我张开嘴,"早上好!" 我高兴地说."大家好吗?"
为了不捕获I said cheerfully.,必须在"早上好!"之后将索引增加1.