Mik*_*.K. 1 python arrays string list
最近的一个项目让我需要将传入的短语(作为字符串)分成组成句子.例如,这个字符串:
"Your mother was a hamster, and your father smelt of elderberries! Now go away, or I shall taunt you a second time. You know what, never mind. This entire sentence is far too silly. Wouldn't you agree? I think it is."
需要将其转换为由以下元素组成的列表:
["Your mother was a hamster, and your father smelt of elderberries",
"Now go away, or I shall taunt you a second time",
"You know what, never mind",
"This entire sentence is far too silly",
"Wouldn't you agree",
"I think it is"]
Run Code Online (Sandbox Code Playgroud)
出于此函数的目的,"句子"是由,,或注意到的字符串!,应从输出中删除标点符号,如上所示.?.
我有一个工作版本,但它很丑,留下前导和尾随空格,我不禁想到有更好的方法:
from functools import reduce
def split_sentences(st):
if type(st) is not str:
raise TypeError("Cannot split non-strings")
sl = st.split('.')
sl = [s.split('?') for s in sl]
sl = reduce(lambda x, y: x+y, sl) #Flatten the list
sl = [s.split('!') for s in sl]
return reduce(lambda x, y: x+y, sl)
Run Code Online (Sandbox Code Playgroud)
re.split而是使用指定匹配任何句子结尾字符(以及任何后续空格)的正则表达式.
def split_sentences(st):
sentences = re.split(r'[.?!]\s*', st)
if sentences[-1]:
return sentences
else:
return sentences[:-1]
Run Code Online (Sandbox Code Playgroud)