我正在尝试编写代码来分割一个没有标点符号的句子.例如,如果用户输入"Hello, how are you?",我可以将句子拆分为['hello','how','are','you']
userinput = str(raw_input("Enter your sentence: "))
def sentence_split(sentence):
result = []
current_word = ""
for letter in sentence:
if letter.isalnum():
current_word += letter
else: ## this is a symbol or punctuation, e.g. reach end of a word
if current_word:
result.append(current_word)
current_word = "" ## reinitialise for creating a new word
return result
print "Split of your sentence:", sentence_split(userinput)
Run Code Online (Sandbox Code Playgroud)
到目前为止我的代码工作,但如果我把一个句子没有用标点符号结尾,最后一个单词将不会显示在结果中,例如,如果输入是"Hello, how are you",结果将是['hello','how','are'],我想这是因为没有标点符号告诉代码字符串结束,有没有办法让程序检测到它是字符串的结尾?因此,即使输入"Hello, how are you",结果仍然是['hello','how','are','you'].
我自己并没有尝试调整你的算法,但我认为下面的方法应该实现你所追求的目标.
def sentence_split(sentence):
new_sentence = sentence[:]
for letter in sentence:
if not letter.isalnum():
new_sentence = new_sentence.replace(letter, ' ')
return new_sentence.split()
Run Code Online (Sandbox Code Playgroud)
现在运行:
runfile(r'C:\ Users\cat\test.py',wdir = r'C:\ Users\cat')
['你好,你好吗']
编辑:修复了new_sentence初始化的错误.
| 归档时间: |
|
| 查看次数: |
7597 次 |
| 最近记录: |