使用多处理疯狂记录

Tri*_*itt 5 python logging multiprocessing

我试图在多处理场景中使用python的默认日志记录模块.我读了:

  1. Python多进程,日志记录,各种类
  2. 使用多处理记录

和其他多篇关于多处理,日志记录,python类等的帖子.在完成所有这些阅读后,我来到这段代码,我无法正常运行使用python的logutils QueueHandler:

import sys
import logging
from logging import INFO
from multiprocessing import Process, Queue as mpQueue
import threading
import time

from logutils.queue import QueueListener, QueueHandler


class Worker(Process):

    def __init__(self, n, q):
        super(Worker, self).__init__()
        self.n = n
        self.queue = q

        self.qh = QueueHandler(self.queue)
        self.root = logging.getLogger()
        self.root.addHandler(self.qh)
        self.root.setLevel(logging.DEBUG)        
        self.logger = logging.getLogger("W%i"%self.n)


    def run(self):
        self.logger.info("Worker %i Starting"%self.n)

        for i in xrange(10):
            self.logger.log(INFO, "testing %i"%i)

        self.logger.log(INFO, "Completed %i"%self.n)


def listener_process(queue):
    while True:
        try:
            record = queue.get()
            if record is None:
                break
            logger = logging.getLogger(record.name)
            logger.handle(record)
        except (KeyboardInterrupt, SystemExit):
            raise
        except:
            import sys, traceback
            print >> sys.stderr, 'Whoops! Problem:'
            traceback.print_exc(file=sys.stderr)

if __name__ == "__main__":

    mpq = mpQueue(-1)

    root = logging.getLogger()
    h = logging.StreamHandler()    
    f = logging.Formatter('%(asctime)s %(processName)-10s %(name)s %(levelname)-8s     %(message)s')
    h.setFormatter(f)
    root.addHandler(h)

    l = logging.getLogger("Test")
    l.setLevel(logging.DEBUG)

    listener = Process(target=listener_process,
                       args=(mpq,))
    listener.start()
    workers=[]
    for i in xrange(1):
        worker = Worker(i, mpq)
        worker.daemon = True
        worker.start()
        workers.append(worker)

    for worker in workers:
        worker.join()

    mpq.put_nowait(None)
    listener.join()

    for i in xrange(10):
        l.info("testing %i"%i)

    print "Finish"
Run Code Online (Sandbox Code Playgroud)

如果代码被执行,输出会以某种方式重复以下行:

2013-12-02 16:44:46,002 Worker-2   W0 INFO         Worker 0 Starting
2013-12-02 16:44:46,003 Worker-2   W0 INFO         testing 0
2013-12-02 16:44:46,003 Worker-2   W0 INFO         testing 1
2013-12-02 16:44:46,003 Worker-2   W0 INFO         testing 2
2013-12-02 16:44:46,002 Worker-2   W0 INFO         Worker 0 Starting
2013-12-02 16:44:46,003 Worker-2   W0 INFO         testing 3
2013-12-02 16:44:46,003 Worker-2   W0 INFO         testing 0
2013-12-02 16:44:46,003 Worker-2   W0 INFO         testing 1
2013-12-02 16:44:46,003 Worker-2   W0 INFO         testing 4
2013-12-02 16:44:46,003 Worker-2   W0 INFO         testing 2
2013-12-02 16:44:46,003 Worker-2   W0 INFO         testing 3
2013-12-02 16:44:46,003 Worker-2   W0 INFO         testing 5
2013-12-02 16:44:46,003 Worker-2   W0 INFO         testing 4
2013-12-02 16:44:46,003 Worker-2   W0 INFO         testing 6
2013-12-02 16:44:46,003 Worker-2   W0 INFO         testing 5
2013-12-02 16:44:46,004 Worker-2   W0 INFO         testing 7
2013-12-02 16:44:46,003 Worker-2   W0 INFO         testing 6
2013-12-02 16:44:46,004 Worker-2   W0 INFO         testing 8
2013-12-02 16:44:46,004 Worker-2   W0 INFO         testing 7
2013-12-02 16:44:46,004 Worker-2   W0 INFO         testing 9
2013-12-02 16:44:46,004 Worker-2   W0 INFO         testing 8
2013-12-02 16:44:46,004 Worker-2   W0 INFO         Completed 0
2013-12-02 16:44:46,004 Worker-2   W0 INFO         testing 9
2013-12-02 16:44:46,004 Worker-2   W0 INFO         Completed 0
2013-12-02 16:44:46,005 MainProcess Test INFO         testing 0
2013-12-02 16:44:46,005 MainProcess Test INFO         testing 1
2013-12-02 16:44:46,005 MainProcess Test INFO         testing 2
2013-12-02 16:44:46,005 MainProcess Test INFO         testing 3
2013-12-02 16:44:46,005 MainProcess Test INFO         testing 4
2013-12-02 16:44:46,005 MainProcess Test INFO         testing 5
2013-12-02 16:44:46,006 MainProcess Test INFO         testing 6
2013-12-02 16:44:46,006 MainProcess Test INFO         testing 7
2013-12-02 16:44:46,006 MainProcess Test INFO         testing 8
2013-12-02 16:44:46,006 MainProcess Test INFO         testing 9
Finish
Run Code Online (Sandbox Code Playgroud)

在其他问题中,建议处理程序不止一次添加,但是,正如您所看到的,我只在方法中添加了一次streamhanlder .我已经测试过将main方法嵌入到具有相同结果的类中.

编辑:正如@max建议的(或者我相信他说的)我已经修改了worker类的代码:

class Worker(Process):

    root = logging.getLogger()
    qh = None

    def __init__(self, n, q):
        super(Worker, self).__init__()
        self.n = n
        self.queue = q

        if not self.qh:
            Worker.qh = QueueHandler(self.queue)            
            Worker.root.addHandler(self.qh)
            Worker.root.setLevel(logging.DEBUG)

        self.logger = logging.getLogger("W%i"%self.n)

        print self.root.handlers

    def run(self):
        self.logger.info("Worker %i Starting"%self.n)

        for i in xrange(10):
            self.logger.log(INFO, "testing %i"%i)

        self.logger.log(INFO, "Completed %i"%self.n)
Run Code Online (Sandbox Code Playgroud)

使用相同的结果,现在队列处理程序不会一次又一次地添加,但仍然存在重复的日志条目,即使只有一个工作程序也是如此.

编辑2:我已经改变了一点代码.我更改了侦听器进程,现在使用了QueueListener(这就是我在开始时的意图),将主代码移动到了一个类.

import sys

import logging
from logging import INFO
from multiprocessing import Process, Queue as mpQueue
import threading
import time

from logutils.queue import QueueListener, QueueHandler

root = logging.getLogger()
added_qh = False

class Worker(Process):

    def __init__(self, logconf, n, qh):
        super(Worker, self).__init__()
        self.n = n
        self.logconf = logconf

#        global root
        global added_qh

        if not added_qh:
            added_qh = True
            root.addHandler(qh)
            root.setLevel(logging.DEBUG)            

        self.logger = logging.getLogger("W%i"%self.n)

        #print root.handlers

    def run(self):
        self.logger.info("Worker %i Starting"%self.n)

        for i in xrange(10):
            self.logger.log(INFO, "testing %i"%i)

        self.logger.log(INFO, "Completed %i"%self.n)


class Main(object):

    def __init__(self):
        pass

    def start(self):

        mpq = mpQueue(-1)
        qh = QueueHandler(mpq)

        h = logging.StreamHandler()

        ql = QueueListener(mpq, h)

        #h.setFormatter(f)
        root.addHandler(qh)

        l = logging.getLogger("Test")
        l.setLevel(logging.DEBUG)

        workers=[]

        for i in xrange(15):
            worker = Worker(logconf, i, qh)
            worker.daemon = True
            worker.start()
            workers.append(worker)

        for worker in workers:
            print "joining worker: {}".format(worker)
            worker.join()

        mpq.put_nowait(None)

        ql.start()

        # listener.join()

        for i in xrange(10):
            l.info("testing %i"%i)

if __name__ == "__main__":


    x = Main()
    x.start()

    time.sleep(10)

    print "Finish"
Run Code Online (Sandbox Code Playgroud)

现在它主要起作用,直到我达到一定数量的工人(~15),因为某些原因主要课程在de join中被阻止而其他工作人员什么都不做.

Ger*_*Yin 7

我来晚了,所以你可能不再需要答案了。问题来自这样一个事实,即您已经在主进程中设置了一个处理程序,而在您的工作人员中又添加了一个处理程序。这意味着在您的工作进程中,实际上有两个处理程序在管理您的数据,一个将日志推送到队列,另一个写入流。

您可以通过self.root.handlers = []在代码中添加额外的行来解决此问题。从您的原始代码中,__init__worker的方法如下所示:

def __init__(self, n, q):
    super(Worker, self).__init__()
    self.n = n
    self.queue = q

    self.qh = QueueHandler(self.queue)
    self.root = logging.getLogger()
    self.root.handlers = []
    self.root.addHandler(self.qh)
    self.root.setLevel(logging.DEBUG)
    self.logger = logging.getLogger("W%i"%self.n)
Run Code Online (Sandbox Code Playgroud)

输出现在看起来像这样:

python workers.py 
2016-05-12 10:07:02,971 Worker-2   W0 INFO         Worker 0 Starting
2016-05-12 10:07:02,972 Worker-2   W0 INFO         testing 0
2016-05-12 10:07:02,973 Worker-2   W0 INFO         testing 1
2016-05-12 10:07:02,973 Worker-2   W0 INFO         testing 2
2016-05-12 10:07:02,973 Worker-2   W0 INFO         testing 3
2016-05-12 10:07:02,973 Worker-2   W0 INFO         testing 4
2016-05-12 10:07:02,973 Worker-2   W0 INFO         testing 5
2016-05-12 10:07:02,973 Worker-2   W0 INFO         testing 6
2016-05-12 10:07:02,973 Worker-2   W0 INFO         testing 7
2016-05-12 10:07:02,973 Worker-2   W0 INFO         testing 8
2016-05-12 10:07:02,973 Worker-2   W0 INFO         testing 9
2016-05-12 10:07:02,973 Worker-2   W0 INFO         Completed 0
Finish
Run Code Online (Sandbox Code Playgroud)

  • 迟到了,但仍然非常相关:) 谢谢:) (2认同)

Max*_*oel 1

您的所有Workers 共享相同的根记录器对象(在调用始终返回相同的记录器中获得Worker.__init__getLogger。但是,每次创建 时,都会向该记录器Worker添加一个处理程序 ( )。QueueHandler

因此,如果您创建 10 个 Worker,您的根记录器上将有 10 个(相同的)处理程序,这意味着输出会重复 10 次。

相反,您应该使记录器成为模块属性而不是实例属性,并在模块级别配置它一次 - 而不是在类级别。

(实际上,记录器应该在程序级别配置一次)