我想从一串文本中删除首字母缩略词的句点,但我也希望o在常规句点(例如句子末尾)中留下.
那么下面这句话:
"The C.I.A. is a department in the U.S. Government."
Run Code Online (Sandbox Code Playgroud)
应该成为
"The CIA is a department in the US Government."
Run Code Online (Sandbox Code Playgroud)
有没有一种干净的方法来使用Python做到这一点?到目前为止,我有两个步骤:
words = "The C.I.A. is a department in the U.S. Government."
words = re.sub(r'([A-Z].[A-Z.]*)\.', r'\1', words)
print words
# The C.I.A is a department in the U.S Government.
words = re.sub(r'\.([A-Z])', r'\1', words)
print words
# The CIA is a department in the US Government.
Run Code Online (Sandbox Code Playgroud)
Mos*_*oye 13
可能这个?
>>> re.sub(r'(?<!\w)([A-Z])\.', r'\1', s)
'The CIA is a department in the US Government.'
Run Code Online (Sandbox Code Playgroud)
替换前面带有大写单个字母的单个点,前提是单个字母前面没有任何\w字符集.后一个标准由负面的后观断言强制执行- (?<!\w).
| 归档时间: |
|
| 查看次数: |
1744 次 |
| 最近记录: |