Tri*_*itt 5 python logging multiprocessing
我试图在多处理场景中使用python的默认日志记录模块.我读了:
和其他多篇关于多处理,日志记录,python类等的帖子.在完成所有这些阅读后,我来到这段代码,我无法正常运行使用python的logutils QueueHandler:
import sys
import logging
from logging import INFO
from multiprocessing import Process, Queue as mpQueue
import threading
import time
from logutils.queue import QueueListener, QueueHandler
class Worker(Process):
def __init__(self, n, q):
super(Worker, self).__init__()
self.n = n
self.queue = q
self.qh = QueueHandler(self.queue)
self.root = logging.getLogger()
self.root.addHandler(self.qh)
self.root.setLevel(logging.DEBUG)
self.logger = logging.getLogger("W%i"%self.n)
def run(self):
self.logger.info("Worker %i Starting"%self.n)
for i in xrange(10):
self.logger.log(INFO, "testing %i"%i)
self.logger.log(INFO, "Completed %i"%self.n)
def listener_process(queue):
while True:
try:
record = queue.get()
if record is None:
break
logger = logging.getLogger(record.name)
logger.handle(record)
except (KeyboardInterrupt, SystemExit):
raise
except:
import sys, traceback
print >> sys.stderr, 'Whoops! Problem:'
traceback.print_exc(file=sys.stderr)
if __name__ == "__main__":
mpq = mpQueue(-1)
root = logging.getLogger()
h = logging.StreamHandler()
f = logging.Formatter('%(asctime)s %(processName)-10s %(name)s %(levelname)-8s %(message)s')
h.setFormatter(f)
root.addHandler(h)
l = logging.getLogger("Test")
l.setLevel(logging.DEBUG)
listener = Process(target=listener_process,
args=(mpq,))
listener.start()
workers=[]
for i in xrange(1):
worker = Worker(i, mpq)
worker.daemon = True
worker.start()
workers.append(worker)
for worker in workers:
worker.join()
mpq.put_nowait(None)
listener.join()
for i in xrange(10):
l.info("testing %i"%i)
print "Finish"
Run Code Online (Sandbox Code Playgroud)
如果代码被执行,输出会以某种方式重复以下行:
2013-12-02 16:44:46,002 Worker-2 W0 INFO Worker 0 Starting
2013-12-02 16:44:46,003 Worker-2 W0 INFO testing 0
2013-12-02 16:44:46,003 Worker-2 W0 INFO testing 1
2013-12-02 16:44:46,003 Worker-2 W0 INFO testing 2
2013-12-02 16:44:46,002 Worker-2 W0 INFO Worker 0 Starting
2013-12-02 16:44:46,003 Worker-2 W0 INFO testing 3
2013-12-02 16:44:46,003 Worker-2 W0 INFO testing 0
2013-12-02 16:44:46,003 Worker-2 W0 INFO testing 1
2013-12-02 16:44:46,003 Worker-2 W0 INFO testing 4
2013-12-02 16:44:46,003 Worker-2 W0 INFO testing 2
2013-12-02 16:44:46,003 Worker-2 W0 INFO testing 3
2013-12-02 16:44:46,003 Worker-2 W0 INFO testing 5
2013-12-02 16:44:46,003 Worker-2 W0 INFO testing 4
2013-12-02 16:44:46,003 Worker-2 W0 INFO testing 6
2013-12-02 16:44:46,003 Worker-2 W0 INFO testing 5
2013-12-02 16:44:46,004 Worker-2 W0 INFO testing 7
2013-12-02 16:44:46,003 Worker-2 W0 INFO testing 6
2013-12-02 16:44:46,004 Worker-2 W0 INFO testing 8
2013-12-02 16:44:46,004 Worker-2 W0 INFO testing 7
2013-12-02 16:44:46,004 Worker-2 W0 INFO testing 9
2013-12-02 16:44:46,004 Worker-2 W0 INFO testing 8
2013-12-02 16:44:46,004 Worker-2 W0 INFO Completed 0
2013-12-02 16:44:46,004 Worker-2 W0 INFO testing 9
2013-12-02 16:44:46,004 Worker-2 W0 INFO Completed 0
2013-12-02 16:44:46,005 MainProcess Test INFO testing 0
2013-12-02 16:44:46,005 MainProcess Test INFO testing 1
2013-12-02 16:44:46,005 MainProcess Test INFO testing 2
2013-12-02 16:44:46,005 MainProcess Test INFO testing 3
2013-12-02 16:44:46,005 MainProcess Test INFO testing 4
2013-12-02 16:44:46,005 MainProcess Test INFO testing 5
2013-12-02 16:44:46,006 MainProcess Test INFO testing 6
2013-12-02 16:44:46,006 MainProcess Test INFO testing 7
2013-12-02 16:44:46,006 MainProcess Test INFO testing 8
2013-12-02 16:44:46,006 MainProcess Test INFO testing 9
Finish
Run Code Online (Sandbox Code Playgroud)
在其他问题中,建议处理程序不止一次添加,但是,正如您所看到的,我只在主方法中添加了一次streamhanlder .我已经测试过将main方法嵌入到具有相同结果的类中.
编辑:正如@max建议的(或者我相信他说的)我已经修改了worker类的代码:
class Worker(Process):
root = logging.getLogger()
qh = None
def __init__(self, n, q):
super(Worker, self).__init__()
self.n = n
self.queue = q
if not self.qh:
Worker.qh = QueueHandler(self.queue)
Worker.root.addHandler(self.qh)
Worker.root.setLevel(logging.DEBUG)
self.logger = logging.getLogger("W%i"%self.n)
print self.root.handlers
def run(self):
self.logger.info("Worker %i Starting"%self.n)
for i in xrange(10):
self.logger.log(INFO, "testing %i"%i)
self.logger.log(INFO, "Completed %i"%self.n)
Run Code Online (Sandbox Code Playgroud)
使用相同的结果,现在队列处理程序不会一次又一次地添加,但仍然存在重复的日志条目,即使只有一个工作程序也是如此.
编辑2:我已经改变了一点代码.我更改了侦听器进程,现在使用了QueueListener(这就是我在开始时的意图),将主代码移动到了一个类.
import sys
import logging
from logging import INFO
from multiprocessing import Process, Queue as mpQueue
import threading
import time
from logutils.queue import QueueListener, QueueHandler
root = logging.getLogger()
added_qh = False
class Worker(Process):
def __init__(self, logconf, n, qh):
super(Worker, self).__init__()
self.n = n
self.logconf = logconf
# global root
global added_qh
if not added_qh:
added_qh = True
root.addHandler(qh)
root.setLevel(logging.DEBUG)
self.logger = logging.getLogger("W%i"%self.n)
#print root.handlers
def run(self):
self.logger.info("Worker %i Starting"%self.n)
for i in xrange(10):
self.logger.log(INFO, "testing %i"%i)
self.logger.log(INFO, "Completed %i"%self.n)
class Main(object):
def __init__(self):
pass
def start(self):
mpq = mpQueue(-1)
qh = QueueHandler(mpq)
h = logging.StreamHandler()
ql = QueueListener(mpq, h)
#h.setFormatter(f)
root.addHandler(qh)
l = logging.getLogger("Test")
l.setLevel(logging.DEBUG)
workers=[]
for i in xrange(15):
worker = Worker(logconf, i, qh)
worker.daemon = True
worker.start()
workers.append(worker)
for worker in workers:
print "joining worker: {}".format(worker)
worker.join()
mpq.put_nowait(None)
ql.start()
# listener.join()
for i in xrange(10):
l.info("testing %i"%i)
if __name__ == "__main__":
x = Main()
x.start()
time.sleep(10)
print "Finish"
Run Code Online (Sandbox Code Playgroud)
现在它主要起作用,直到我达到一定数量的工人(~15),因为某些原因主要课程在de join中被阻止而其他工作人员什么都不做.
我来晚了,所以你可能不再需要答案了。问题来自这样一个事实,即您已经在主进程中设置了一个处理程序,而在您的工作人员中又添加了一个处理程序。这意味着在您的工作进程中,实际上有两个处理程序在管理您的数据,一个将日志推送到队列,另一个写入流。
您可以通过self.root.handlers = []在代码中添加额外的行来解决此问题。从您的原始代码中,__init__worker的方法如下所示:
def __init__(self, n, q):
super(Worker, self).__init__()
self.n = n
self.queue = q
self.qh = QueueHandler(self.queue)
self.root = logging.getLogger()
self.root.handlers = []
self.root.addHandler(self.qh)
self.root.setLevel(logging.DEBUG)
self.logger = logging.getLogger("W%i"%self.n)
Run Code Online (Sandbox Code Playgroud)
输出现在看起来像这样:
python workers.py
2016-05-12 10:07:02,971 Worker-2 W0 INFO Worker 0 Starting
2016-05-12 10:07:02,972 Worker-2 W0 INFO testing 0
2016-05-12 10:07:02,973 Worker-2 W0 INFO testing 1
2016-05-12 10:07:02,973 Worker-2 W0 INFO testing 2
2016-05-12 10:07:02,973 Worker-2 W0 INFO testing 3
2016-05-12 10:07:02,973 Worker-2 W0 INFO testing 4
2016-05-12 10:07:02,973 Worker-2 W0 INFO testing 5
2016-05-12 10:07:02,973 Worker-2 W0 INFO testing 6
2016-05-12 10:07:02,973 Worker-2 W0 INFO testing 7
2016-05-12 10:07:02,973 Worker-2 W0 INFO testing 8
2016-05-12 10:07:02,973 Worker-2 W0 INFO testing 9
2016-05-12 10:07:02,973 Worker-2 W0 INFO Completed 0
Finish
Run Code Online (Sandbox Code Playgroud)
您的所有Workers 共享相同的根记录器对象(在调用始终返回相同的记录器中获得Worker.__init__)getLogger。但是,每次创建 时,都会向该记录器Worker添加一个处理程序 ( )。QueueHandler
因此,如果您创建 10 个 Worker,您的根记录器上将有 10 个(相同的)处理程序,这意味着输出会重复 10 次。
相反,您应该使记录器成为模块属性而不是实例属性,并在模块级别配置它一次 - 而不是在类级别。
(实际上,记录器应该在程序级别配置一次)