何时使用线程以及使用多少线程

Question

何时使用线程以及使用多少线程

boy*_*ode 2 python multithreading python-multithreading

我有一个工作项目。我们已经编写了一个模块，并在那里作为#TODO 来实现线程以改进模块。我是一个相当新的 Python 程序员，并决定尝试一下。在学习和实现线程时，我遇到了类似于多少线程太多的问题？因为我们有一个大约需要处理 6 个对象的队列，那么为什么在处理时间可以忽略不计的情况下创建 6 个线程（或任何线程）来处理列表或队列中的对象呢？（每个对象最多需要大约 2 秒来处理）

所以我做了一个小实验。我想知道使用线程是否有性能提升。请参阅下面的我的python代码：

import threading
import queue
import math
import time

results_total = []
results_calculation = []
results_threads = []

class MyThread(threading.Thread):
    def __init__(self, thread_id, q):
        threading.Thread.__init__(self)
        self.threadID = thread_id
        self.q = q

    def run(self):
        # print("Starting " + self.name)
        process_data(self.q)
        # print("Exiting " + self.name)


def process_data(q):
    while not exitFlag:
        queueLock.acquire()
        if not workQueue.empty():
            potentially_prime = True
            data = q.get()
            queueLock.release()
            # check if the data is a prime number
            # print("Testing {0} for primality.".format(data))
            for i in range(2, int(math.sqrt(data)+1)):
                if data % i == 0:
                    potentially_prime = False
                    break
            if potentially_prime is True:
                prime_numbers.append(data)
        else:
            queueLock.release()

for j in [1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, 75, 100, 150, 250, 500,
          750, 1000, 2500, 5000, 10000]:
    threads = []
    numberList = list(range(1, 10001))
    queueLock = threading.Lock()
    workQueue = queue.Queue()
    numberThreads = j
    prime_numbers = list()
    exitFlag = 0

    start_time_total = time.time()
    # Create new threads
    for threadID in range(0, numberThreads):
        thread = MyThread(threadID, workQueue)
        thread.start()
        threads.append(thread)

    # Fill the queue
    queueLock.acquire()
    # print("Filling the queue...")
    for number in numberList:
        workQueue.put(number)
    queueLock.release()
    # print("Queue filled...")
    start_time_calculation = time.time()
    # Wait for queue to empty
    while not workQueue.empty():
        pass

    # Notify threads it's time to exit
    exitFlag = 1

    # Wait for all threads to complete
    for t in threads:
        t.join()
    # print("Exiting Main Thread")
    # print(prime_numbers)
    end_time = time.time()
    results_total.append(
            "The test took {0} seconds for {1} threads.".format(
                end_time - start_time_total, j)
            )
    results_calculation.append(
            "The calculation took {0} seconds for {1} threads.".format(
                    end_time - start_time_calculation, j)
            )
    results_threads.append(
            "The thread setup time took {0} seconds for {1} threads.".format(
                    start_time_calculation - start_time_total, j)
            )
for result in results_total:
    print(result)
for result in results_calculation:
    print(result)
for result in results_threads:
    print(result)

Run Code Online (Sandbox Code Playgroud)

This test finds the prime numbers between 1 and 10000. This set up is pretty much taken right from https://www.tutorialspoint.com/python3/python_multithreading.htm but instead of printing a simple string I ask the threads to find prime numbers. This is not actually what my real world application is but I can't currently test the code I've written for the module. I thought this was a good test to measure the effect of additional threads. My real world application deals with talking to multiple serial devices. I ran the test 5 times and averaged the times. Here are the results in a graph:

My questions regarding threading and this test are as follows:

Is this test even a good representation of how threads should be used? This is not a server/client situation. In terms of efficiency, is it better to avoid parallelism when you aren't serving clients or dealing with assignments/work being added to a queue?
If the answer to 1 is "No, this test isn't a place where one should use threads." then when is? Generally speaking.
If the answer to 1 is "Yes, this is ok to use threads in that case.", why does adding threads end up taking longer and quickly reaches a plateau? Rather, why would one want to use threads as it takes many times longer than calculating it in a loop.

I notice that as the work to threads ratio gets closer to 1:1, the time taken to set up the threads becomes longer. So is threading only useful where you create threads once and keep them alive as long as possible to handle requests that might enqueue faster than they can be calculated?

Answer 1

Dan*_*man 7

不，这不是使用线程的好地方。

通常，您希望在代码受 IO 限制的地方使用线程；也就是说，它花费大量时间等待输入或输出。一个例子可能是从 URL 列表中并行下载数据；代码可以开始从下一个 URL 请求数据，同时仍在等待前一个 URL 返回。

情况并非如此。计算素数是受 CPU 限制的。

归档时间：	9 年，1 月前
查看次数：	9268 次
最近记录：	9 年，1 月前