小编bha*_*udi的帖子

python中的线程池没有预期的那么快

我是 Python 和机器学习的初学者。我正在尝试重现countvectorizer()使用多线程的代码。我正在使用 yelp 数据集使用LogisticRegression. 这是我到目前为止所写的:

代码片段:

from multiprocessing.dummy import Pool as ThreadPool
from threading import Thread, current_thread
from functools import partial
data = df['text']
rev = df['stars'] 


y = []
def product_helper(args):
    return featureExtraction(*args)


def featureExtraction(p,t):     
    temp = [0] * len(bag_of_words)
    for word in p.split():
        if word in bag_of_words:
            temp[bag_of_words.index(word)] += 1

    return temp


# function to be mapped over
def calculateParallel(threads): 
    pool = ThreadPool(threads)
    job_args = [(item_a, rev[i]) for i, item_a in enumerate(data)]
    l …
Run Code Online (Sandbox Code Playgroud)

python multithreading machine-learning feature-extraction

3
推荐指数
1
解决办法
3216
查看次数