Python将多个单词列表转换为单个单词

GNM*_*O11 4 python nlp nltk

我有一个单词列表例如:

words = ['one','two','three four','five','six seven'] #quote缺失了

我正在尝试创建一个新列表,列表中的每个项目只有一个单词,所以我会:

words = ['one','two','three','four','five','six','seven']

最好的做法是将整个列表加入一个字符串,然后将字符串标记化吗?像这样的东西:

word_string = ' '.join(words) tokenize_list = nltk.tokenize(word_string)

或者有更好的选择吗?

Tig*_*kT3 10

words = ['one','two','three four','five','six seven']
Run Code Online (Sandbox Code Playgroud)

循环:

words_result = []
for item in words:
    for word in item.split():
        words_result.append(word)
Run Code Online (Sandbox Code Playgroud)

或作为一种理解:

words = [word for item in words for word in item.split()]
Run Code Online (Sandbox Code Playgroud)


EdC*_*ica 9

您可以使用空格分隔符进行连接,然后再次拆分:

In [22]:

words = ['one','two','three four','five','six seven']
' '.join(words).split()
Out[22]:
['one', 'two', 'three', 'four', 'five', 'six', 'seven']
Run Code Online (Sandbox Code Playgroud)

  • 琐碎的建议:你可以调用`split()`而不用参数来保存三个字符. (2认同)