我遇到了多处理模块的麻烦.我正在使用一个带有map方法的工作池来从大量文件加载数据,并且每个文件都使用自定义函数分析数据.每次处理文件时,我都希望更新一个计数器,以便我可以跟踪要处理的文件数量.这是示例代码:
def analyze_data( args ):
# do something
counter += 1
print counter
if __name__ == '__main__':
list_of_files = os.listdir(some_directory)
global counter
counter = 0
p = Pool()
p.map(analyze_data, list_of_files)
Run Code Online (Sandbox Code Playgroud)
我无法找到解决方案.
哪个ngram实现在python中最快?
我试图描述nltk的vs scott的zip(http://locallyoptimal.com/blog/2013/01/20/elegant-n-gram-generation-in-python/):
from nltk.util import ngrams as nltkngram
import this, time
def zipngram(text,n=2):
return zip(*[text.split()[i:] for i in range(n)])
text = this.s
start = time.time()
nltkngram(text.split(), n=2)
print time.time() - start
start = time.time()
zipngram(text, n=2)
print time.time() - start
Run Code Online (Sandbox Code Playgroud)
[OUT]
0.000213146209717
6.50882720947e-05
Run Code Online (Sandbox Code Playgroud)
有没有更快的实现在python中生成ngrams?