我在python中使用RE表达式并试图按句点和感叹号分割一大块文本.然而,当我拆分它时,我在结果中得到"无"
a = "This is my text...I want it to split by periods. I also want it to split \
by exclamation marks! Is that so much to ask?"
Run Code Online (Sandbox Code Playgroud)
这是我的代码:
re.split('((?<=\w)\.(?!\..))|(!)',a)
Run Code Online (Sandbox Code Playgroud)
请注意,我有这个(?<=\w).(?!..),因为我希望它避免使用省略号.不过,上面的代码吐出:
['This is my text...I want it to split by periods', '.', None, ' \
I also want it to split by exclamation marks', None, '!', \
' Is that so much to ask?']
Run Code Online (Sandbox Code Playgroud)
如您所见,在句号或感叹号所在的位置,它在我的列表中添加了一个特殊的"无".为什么这样,我怎么摆脱它?
And*_*ark 11
请尝试以下方法:
re.split(r'((?<=\w)\.(?!\..)|!)', a)
Run Code Online (Sandbox Code Playgroud)
你得到了,None因为你有两个捕获组,所有组都作为re.split()结果的一部分包含在内.
因此,只要您匹配.第二个捕获组None,就可以在任何时候匹配!第一个捕获组None.
结果如下:
['This is my text...I want it to split by periods',
'.',
' I also want it to split by exclamation marks',
'!',
' Is that so much to ask?']
Run Code Online (Sandbox Code Playgroud)
如果您不想在结果中包含'.'和'!',请删除围绕整个表达式的括号:r'(?<=\w)\.(?!\..)|!'