我正在尝试编写一个程序,将每个句子的第一个字母大写.这是我到目前为止所做的,但我无法弄清楚如何在句子之间添加句点.例如,如果我输入:你好.再见,输出是Hello Goodbye,期间已经消失.
string=input('Enter a sentence/sentences please:')
sentence=string.split('.')
for i in sentence:
print(i.capitalize(),end='')
Run Code Online (Sandbox Code Playgroud)
你可以使用nltk进行句子分割:
#!/usr/bin/env python3
import textwrap
from pprint import pprint
import nltk.data # $ pip install http://www.nltk.org/nltk3-alpha/nltk-3.0a3.tar.gz
# python -c "import nltk; nltk.download('punkt')"
sent_tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
text = input('Enter a sentence/sentences please:')
print("\n" + textwrap.fill(text))
sentences = sent_tokenizer.tokenize(text)
sentences = [sent.capitalize() for sent in sentences]
pprint(sentences)
Run Code Online (Sandbox Code Playgroud)
Enter a sentence/sentences please: a period might occur inside a sentence e.g., see! and the sentence may end without the dot! ['A period might occur inside a sentence e.g., see!', 'And the sentence may end without the dot!']
您可以使用正则表达式。定义一个匹配句子第一个单词的正则表达式:
import re
p = re.compile(r'(?<=[\.\?!]\s)(\w+))
Run Code Online (Sandbox Code Playgroud)
此正则表达式包含正向后断言(?<=...)它匹配或者是.,?或者!,在一个空格字符\s。后面跟着一个匹配一个或多个字母数字字符的组\w+。实际上,匹配句子结束后的下一个单词。
您可以定义一个将正则表达式匹配对象大写的函数,并将此函数提供给sub():
def cap(match):
return(match.group().capitalize())
p.sub(cap, 'Your text here. this is fun! yay.')
Run Code Online (Sandbox Code Playgroud)
您可能希望对另一个与字符串开头的单词匹配的正则表达式执行相同的操作:
p2 = re.compile(r'^\w+')
Run Code Online (Sandbox Code Playgroud)
或者通过组合它们使原始正则表达式更难阅读:
p = re.compile(r'((?<=[\.\?!]\s)(\w+)|(^\w+))')
Run Code Online (Sandbox Code Playgroud)