在Python中执行多个列表理解的最有效方法

Question

在Python中执行多个列表理解的最有效方法

Sea*_*ean 3 python list-comprehension nltk

考虑到这三个列表的理解，是否有比这三个故意的设置更有效的方法呢？我相信这种情况下的for循环可能是不好的形式，但是如果我要遍历rowaslist中的大量行，我感觉下面的效率不高。

cachedStopWords = stopwords.words('english')

rowsaslist = [x.lower() for x in rowsaslist]
rowsaslist = [''.join(c for c in s if c not in string.punctuation) for s in rowsaslist]
rowsaslist = [' '.join([word for word in p.split() if word not in cachedStopWords]) for p in rowsaslist]

Run Code Online (Sandbox Code Playgroud)

将所有这些组合成一条理解语句是否更有效率？我从可读性的角度知道这可能是一堆代码。

Answer 1

Eri*_*nil 5

不必在同一列表上重复3次，您只需定义2个函数并在单个列表推导中使用它们即可：

cachedStopWords = stopwords.words('english')


def remove_punctuation(text):
    return ''.join(c for c in text.lower() if c not in string.punctuation)

def remove_stop_words(text):
    return ' '.join([word for word in p.split() if word not in cachedStopWords])

rowsaslist = [remove_stop_words(remove_punctuation(text)) for text in rowsaslist]

Run Code Online (Sandbox Code Playgroud)

我从没用过stopwords。如果返回列表，则最好将其转换为set第一个，以加快word not in cachedStopWords测试速度。

最后，该NLTK程序包可以帮助您处理文本。参见@alvas的答案。

归档时间：	8 年，5 月前
查看次数：	272 次
最近记录：	6 年，10 月前