Bai*_*vfc 0 python string text nlp period
我在文本文件中有一个句子,我想在 python 中显示,但我想显示它,以便在每个句号(句点)之后开始一个新行。
\n\n例如我的段落是
\n\n"Dr. Harrison bought bargain.co.uk for 2.5 million pounds, i.e. he\npaid a lot for it. Did he mind? John Smith, Esq. thinks he didn\'t.\nNevertheless, this isn\'t true... Well, with a probability of .9 it\nisn\'t."\nRun Code Online (Sandbox Code Playgroud)\n\n但我希望它显示如下
\n\n"Dr. Harrison bought bargain.co.uk for 2.5 million pounds, i.e. he\npaid a lot for it. \nDid he mind? John Smith, Esq. thinks he didn\'t. \nNevertheless, this isn\'t true... \nWell, with a probability of .9 it isn\xe2\x80\x99t."\nRun Code Online (Sandbox Code Playgroud)\n\n句子中出现的其他句号(例如网站地址中的“Dr.”、“Esq.”、“.9”,当然还有前两个句号)使得这一点变得越来越困难。省略号中的点。
\n\n我不确定如何处理文本文件中存在的其他时期,任何人都可以帮忙吗?谢谢。
\n\n“你的任务是编写一个程序,给定文本文件的名称,\n能够将其内容与每个句子放在单独的行上。” <-- 任务集
\n这对您的文本起作用:
text = "Dr. Harrison bought bargain.co.uk for 2.5 million pounds, i.e. he "\
"paid a lot for it. Did he mind? John Smith, Esq. thinks he didn't. "\
"Nevertheless, this isn't true... Well, with a probability of .9 it "\
"isn't."
import re
pat = ('(?<!Dr)(?<!Esq)\. +(?=[A-Z])')
print re.sub(pat,'.\n',text)
Run Code Online (Sandbox Code Playgroud)
结果
Dr. Harrison bought bargain.co.uk for 2.5 million pounds, i.e. he paid a lot for it.
Did he mind? John Smith, Esq. thinks he didn't.
Nevertheless, this isn't true...
Well, with a probability of .9 it isn't.
Run Code Online (Sandbox Code Playgroud)
但是,不可能有一个正则表达式模式在人类写作这样复杂的事情中永远不会失败。
请注意,例如,我不得不放置一个否定的lookbehind断言以排除Dr.的情况(并且我对Esq.做了同样的事情,尽管它并不代表您的文本中的问题,因为它后面跟着的认为并不以大写字母开头)
我认为不可能将所有类似的情况提前放入正则表达式模式中,总有一天或另一天会发生未经处理的情况。
但这段代码完成了很多所需的工作。还不错,我尊重。
| 归档时间: |
|
| 查看次数: |
3773 次 |
| 最近记录: |