我有一个单词列表例如:
words = ['one','two','three four','five','six seven']
#quote缺失了
我正在尝试创建一个新列表,列表中的每个项目只有一个单词,所以我会:
words = ['one','two','three','four','five','six','seven']
最好的做法是将整个列表加入一个字符串,然后将字符串标记化吗?像这样的东西:
word_string = ' '.join(words)
tokenize_list = nltk.tokenize(word_string)
或者有更好的选择吗?
Tig*_*kT3 10
words = ['one','two','three four','five','six seven']
Run Code Online (Sandbox Code Playgroud)
循环:
words_result = []
for item in words:
for word in item.split():
words_result.append(word)
Run Code Online (Sandbox Code Playgroud)
或作为一种理解:
words = [word for item in words for word in item.split()]
Run Code Online (Sandbox Code Playgroud)
您可以使用空格分隔符进行连接,然后再次拆分:
In [22]:
words = ['one','two','three four','five','six seven']
' '.join(words).split()
Out[22]:
['one', 'two', 'three', 'four', 'five', 'six', 'seven']
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
1133 次 |
最近记录: |