在python中读取文件后返回单词列表

mzn*_*rft 7 python string list

我有一个名为的文本文件test.txt.我想阅读它并从文件中返回所有单词列表(删除换行符).

这是我目前的代码:

def read_words(test.txt):
    open_file = open(words_file, 'r')
    words_list =[]
    contents = open_file.readlines()
    for i in range(len(contents)):
         words_list.append(contents[i].strip('\n'))
    return words_list    
    open_file.close()  
Run Code Online (Sandbox Code Playgroud)

运行此代码会生成以下列表:

['hello there how is everything ', 'thank you all', 'again', 'thanks a lot']
Run Code Online (Sandbox Code Playgroud)

我希望列表看起来像这样:

['hello','there','how','is','everything','thank','you','all','again','thanks','a','lot']
Run Code Online (Sandbox Code Playgroud)

mgi*_*son 20

根据文件的大小,这似乎很简单:

with open(file) as f:
    words = f.read().split()
Run Code Online (Sandbox Code Playgroud)


And*_*ark 14

words_list.append(...)以下内容替换for循环中的行:

words_list.extend(contents[i].split())
Run Code Online (Sandbox Code Playgroud)

这将在空格字符上拆分每一行,然后将结果列表的每个元素添加到words_list.

或者作为将整个函数重写为列表理解的替代方法:

def read_words(words_file):
    return [word for line in open(words_file, 'r') for word in line.split()]
Run Code Online (Sandbox Code Playgroud)


NPE*_*NPE 5

这是我写的方式:

def read_words(words_file):
  with open(words_file, 'r') as f:
    ret = []
    for line in f:
      ret += line.split()
    return ret

print read_words('test.txt')
Run Code Online (Sandbox Code Playgroud)

使用可以稍微缩短功能itertools,但我个人觉得结果不太可读:

import itertools

def read_words(words_file):
  with open(words_file, 'r') as f:
    return list(itertools.chain.from_iterable(line.split() for line in f))

print read_words('test.txt')
Run Code Online (Sandbox Code Playgroud)

关于第二个版本的好处是它可以完全基于生成器,因此避免一次将所有文件的单词保存在内存中.