我有以下字符串:
'Well, I've tried to say "How Doth the Little Busy Bee," but it all came different!' Alice replied in a very melancholy voice. She continued, 'I'll try again.'
Run Code Online (Sandbox Code Playgroud)
现在,我想提取以下引号:
1. Well, I've tried to say "How Doth the Little Busy Bee," but it all came different!
2. How Doth the Little Busy Bee,
3. I'll try again.
Run Code Online (Sandbox Code Playgroud)
我尝试了以下代码,但没有得到我想要的。该[^\1]*不会按预期工作。还是其他地方的问题?
import re
s = "'Well, I've tried to say \"How Doth the Little Busy Bee,\" but it all came different!' Alice replied in a very melancholy voice. She continued, 'I'll try again.'"
for i, m in enumerate(re.finditer(r'([\'"])(?!(?:ve|m|re|s|t|d|ll))(?=([^\1]*)\1)', s)):
print("\nGroup {:d}: ".format(i+1))
for g in m.groups():
print(' '+g)
Run Code Online (Sandbox Code Playgroud)
如果您确实需要从仅应用一次的单个正则表达式返回所有结果,则有必要使用lookahead ( (?=findme)),以便在每次匹配后查找位置返回到开头 - 请参阅此答案以获取更详细的解释。
为了防止错误匹配,还需要一些关于增加复杂性的引号的子句,例如撇号 inI've不应算作开始或结束引号。没有单一明确的方法可以做到这一点,但我遵循的规则是:
A"不会算作开场报价,但,"会算作。'B不会算作收盘价,但'.会算作收盘价。应用上述规则会得到以下正则表达式:
(?=(?:(?<!\w)'(\w.*?)'(?!\w)|\"(\w.*?)\"(?!\w)))
Run Code Online (Sandbox Code Playgroud)

对任何可能的候选正则表达式的一个快速健全性检查测试是反转引号。这已在regex101 演示中完成。