限制 Python 线程的并发数和速率

OG *_*ude 5 python concurrency multithreading rate-limiting

给定一定数量的线程,我想将对工作函数的调用速率限制为每秒一次。

我的想法是跟踪所有线程上次调用的时间,并将其与每个线程中的当前时间进行比较。那么如果current_time - last_time < rate. 我让线程休眠一会儿。我的实现有问题 - 我想我可能对锁的工作原理有错误的想法。

我的代码:

from Queue import Queue
from threading import Thread, Lock, RLock
import time

num_worker_threads = 2
rate = 1
q = Queue()
lock = Lock()
last_time = [time.time()]

def do_work(i, idx):
    # Do work here, print is just a dummy.
    print('Thread: {0}, Item: {1}, Time: {2}'.format(i, idx, time.time()))

def worker(i):
    while True:
        lock.acquire()
        current_time = time.time()
        interval = current_time - last_time[0]
        last_time[0] = current_time
        if interval < rate:
            time.sleep(rate - interval)
        lock.release()
        item = q.get()
        do_work(i, item)
        q.task_done()

for i in range(num_worker_threads):
     t = Thread(target=worker, args=[i])
     t.daemon = True
     t.start()

for item in xrange(10):
    q.put(item)

q.join()
Run Code Online (Sandbox Code Playgroud)

我预计每秒会看到 1 个调用do_work,但是,我几乎同时收到 2 个调用(每个线程 1 个),然后暂停一秒。怎么了?


好的,进行一些编辑。简单地限制将项目放入队列的速率的建议很好,但是我记得我必须处理工作人员将项目重新添加到队列中的情况。典型示例:网络任务中的分页或后退重试。我想出了以下内容。我想对于实际的网络任务,eventlet/gevent 库可能在资源上更容易,但这只是一个例子。它基本上使用优先级队列来堆积请求,并使用额外的线程以均匀的速率将项目从堆中铲到实际的任务队列中。我模拟了工人将其重新插入堆中,然后首先处理重新插入的物品。

import sys
import os
import time
import random

from Queue import Queue, PriorityQueue
from threading import Thread

rate = 0.1

def worker(q, q_pile, idx):
    while True:
        item = q.get()
        print("Thread: {0} processed: {1}".format(item[1], idx))
        if random.random() > 0.3:
            print("Thread: {1} reinserting item: {0}".format(item[1], idx))
            q_pile.put((-1 * time.time(), item[1]))
        q.task_done()

def schedule(q_pile, q):
    while True:
        if not q_pile.empty():
            print("Items on pile: {0}".format(q_pile.qsize()))
            q.put(q_pile.get())
            q_pile.task_done()
        time.sleep(rate)

def main():

    q_pile = PriorityQueue()
    q = Queue()

    for i in range(5):
        t = Thread(target=worker, args=[q, q_pile, i])
        t.daemon = True
        t.start()

    t_schedule = Thread(target=schedule, args=[q_pile, q])
    t_schedule.daemon = True
    t_schedule.start()

    [q_pile.put((-1 * time.time(), i)) for i in range(10)]
    q_pile.join()
    q.join()

if __name__ == '__main__':
    main()
Run Code Online (Sandbox Code Playgroud)

Joc*_*zel 1

我几乎同时收到 2 个调用(每个线程 1 个),然后暂停一秒。怎么了?

这正是您对实施的期望。假设时间t从 0 开始,速率为 1:

线程 1 执行以下操作:

    lock.acquire() # both threads wait here, one gets the lock
    current_time = time.time() # we start at t=0
    interval = current_time - last_time[0] # so interval = 0
    last_time[0] = current_time # last_time = t = 0
    if interval < rate: # rate = 1 so we sleep
        time.sleep(rate - interval) # to t=1
    lock.release() # now the other thread wakes up
    # it's t=1 and we do the job
Run Code Online (Sandbox Code Playgroud)

线程2这样做:

    lock.acquire() # we get the lock at t=1 
    current_time = time.time() # still t=1
    interval = current_time - last_time[0] # interval = 1
    last_time[0] = current_time
    if interval < rate: # interval = rate = 1 so we don't sleep
        time.sleep(rate - interval)
    lock.release() 
    # both threads start the work around t=1
Run Code Online (Sandbox Code Playgroud)

我的建议是限制将项目放入队列的速度。