相关疑难解决方法(0)

我正在使用Python 3.5.2

我有两个清单

所以,我必须循环750,000个句子并执行大约20,000次替换,但是只有我的话实际上是"单词"并且不是更大字符串的一部分.

我是通过预先编译我的文字来做到这一点的,这样它们就被\b元字符所包围

compiled_words = [re.compile(r'\b' + word + r'\b') for word in my20000words]

然后我循环我的"句子"

import re

for sentence in sentences:
  for word in compiled_words:
    sentence = re.sub(word, "", sentence)
  # put sentence into a growing list

这个嵌套循环每秒处理大约50个句子,这很好,但是处理我的所有句子仍需要几个小时.

谢谢你的任何建议.

117
推荐指数

7
解决办法

2万
查看次数