Python在第X句后剪了一个字符串

Hel*_*nar 4 python string

我必须剪切一个unicode字符串,这实际上是一篇文章(包含句子)我想在python中的第X个句子后剪切这篇文章字符串.

句子结尾的一个好指标是它以句号结束(".")和以大写字母开头后的单词.如

myarticle == "Hi, this is my first sentence. And this is my second. Yet this is my third."
Run Code Online (Sandbox Code Playgroud)

怎么能实现这一目标?

谢谢

Tim*_*ara 15

考虑下载Natural Language Toolkit(NLTK).然后你可以创建一些句子,这些句子不会像"USA"这样的东西中断,或者不能分割以"?!"结尾的句子.

>>> import nltk
>>> paragraph = u"Hi, this is my first sentence. And this is my second. Yet this is my third."
>>> sentences = nltk.sent_tokenize(paragraph)
[u"Hi, this is my first sentence.", u"And this is my second.", u"Yet this is my third."]
Run Code Online (Sandbox Code Playgroud)

您的代码变得更具可读性.要访问第二句,请使用您习惯的符号.

>>> sentences[1]
u"And this is my second."
Run Code Online (Sandbox Code Playgroud)