使用 ThreadPoolExecutor 时记录线程

goc*_*oph 5 python python-multithreading

我正在使用ThreadPoolExecutorpythonconcurrent.futures并行抓取结果并将结果写入数据库。这样做时,我意识到如果其中一个线程失败,我将无法获得任何信息。我怎样才能正确地知道哪些线程失败以及为什么失败(因此使用“正常”回溯)?下面是一个最小的工作示例。

import logging
logging.basicConfig(format='%(asctime)s  %(message)s', 
    datefmt='%y-%m-%d %H:%M:%S', level=logging.INFO)
from concurrent.futures import ThreadPoolExecutor

def worker_bee(seed):
    # sido is not defined intentionally to break the code
    result = seed + sido
    return result

# uncomment next line, and you will get the usual traceback
# worker_bee(1)

# ThreadPoolExecutor will not provide any traceback
logging.info('submitting all jobs to the queue')
with ThreadPoolExecutor(max_workers=4) as executor:
    for seed in range(0,10):
        executor.submit(worker_bee, seed)
    logging.info(f'submitted, waiting for threads to finish')
Run Code Online (Sandbox Code Playgroud)

如果我在内部导入日志记录worker_bee()并将消息定向到根记录器,我可以在最终日志中看到这些消息。但我只能看到我定义的日志消息,而看不到代码实际失败位置的回溯。

mar*_*eau 3

您可以通过检索结果来获得“正常回溯” executor.submit()。这将允许一段时间过去并且线程开始执行(并且可能失败)。

这就是我的意思:

from concurrent.futures import ThreadPoolExecutor
import logging

logging.basicConfig(format='%(asctime)s  %(message)s',
                    datefmt='%y-%m-%d %H:%M:%S', level=logging.INFO)

def worker_bee(seed):
    # sido is not defined intentionally to break the code
    result = seed + sido
    return result

logging.info('submitting all jobs to the queue')
with ThreadPoolExecutor(max_workers=4) as executor:
    results = []
    for seed in range(10):
        result = executor.submit(worker_bee, seed)
        results.append(result)
    logging.info(f'submitted, waiting for threads to finish')

for result in results:
    print(result.result())
Run Code Online (Sandbox Code Playgroud)

输出:

from concurrent.futures import ThreadPoolExecutor
import logging

logging.basicConfig(format='%(asctime)s  %(message)s',
                    datefmt='%y-%m-%d %H:%M:%S', level=logging.INFO)

def worker_bee(seed):
    # sido is not defined intentionally to break the code
    result = seed + sido
    return result

logging.info('submitting all jobs to the queue')
with ThreadPoolExecutor(max_workers=4) as executor:
    results = []
    for seed in range(10):
        result = executor.submit(worker_bee, seed)
        results.append(result)
    logging.info(f'submitted, waiting for threads to finish')

for result in results:
    print(result.result())
Run Code Online (Sandbox Code Playgroud)