Jam*_*tta 1 python named-pipes multiprocessing io-redirection
我有一个multiprocessing.pool.map用于做一些工作的 python 脚本。随着它的进行,它会将内容打印到stdout,对于错误,它会打印到stderr。我决定为每个流都有一个单独的日志文件会很好,经过一些思考后,我应该像这样运行它:
time ./ecisSearch.py 58Ni.conf 4 1 > >(tee stdout.log) 2> >(tee stderr.log >&2)
Run Code Online (Sandbox Code Playgroud)
这为我提供了日志文件并将输出保留在适当的流上。然而问题来了。如果我在没有重定向的情况下运行它,我会得到这个:
$ time ./ecisSearch.py 58Ni.conf 4 1
2015-01-09 14:42:37.524333: This job will perform 4 fit(s) //this is stdout
2015-01-09 14:42:37.524433: Threaded mapping of the fit function onto the starting point input set is commencing //this is stdout
2015-01-09 14:42:37.526641: Starting run #: 0 //this is stdout
2015-01-09 14:42:37.527018: Starting run #: 1 //this is stdout
2015-01-09 14:42:37.529124: Starting run #: 2 //this is stdout
2015-01-09 14:42:37.529831: Starting run #: 3 //this is stdout
2015-01-09 14:42:54.052522: Test of std err writing in run 0 is finished //this is stderr
2015-01-09 14:42:54.502284: Test of std err writing in run 1 is finished //this is stderr
2015-01-09 14:42:59.952433: Test of std err writing in run 3 is finished //this is stderr
2015-01-09 14:43:03.259783: Test of std err writing in run 2 is finished //this is stderr
2015-01-09 14:43:03.260360: Finished fits in job #: 1 preparing to output data to file //this is stdout
2015-01-09 14:43:03.275472: Job finished //this is stdout
real 0m26.001s
user 0m44.145s
sys 0m32.626s
Run Code Online (Sandbox Code Playgroud)
但是,使用重定向运行它会生成以下输出。
$ time ./ecisSearch.py 58Ni.conf 4 1 > >(tee stdout.log) 2> >(tee stderr.log >&2)
2015-01-09 14:55:13.188230: Test of std err writing in run 0 is finished //this is stderr
2015-01-09 14:55:13.855079: Test of std err writing in run 1 is finished //this is stderr
2015-01-09 14:55:19.526580: Test of std err writing in run 3 is finished //this is stderr
2015-01-09 14:55:23.628807: Test of std err writing in run 2 is finished //this is stderr
2015-01-09 14:54:56.534790: Starting run #: 0 //this is stdout
2015-01-09 14:54:56.535162: Starting run #: 1 //this is stdout
2015-01-09 14:54:56.538952: Starting run #: 3 //this is stdout
2015-01-09 14:54:56.563677: Starting run #: 2 //this is stdout
2015-01-09 14:54:56.531837: This job will perform 4 fit(s) //this is stdout
2015-01-09 14:54:56.531912: Threaded mapping of the fit function onto the starting point input set is commencing //this is stdout
2015-01-09 14:55:23.629427: Finished fits in job #: 1 preparing to output data to file //this is stdout
2015-01-09 14:55:23.629742: Job finished //this is stdout
real 0m27.376s
user 0m44.661s
sys 0m33.295s
Run Code Online (Sandbox Code Playgroud)
仅查看时间戳,我们就可以看到这里发生了一些奇怪的事情。不仅是stderr和stdout流未相互穿插,因为他们是应该的,但stdout分量似乎有从子进程先东西,然后东西,从“主”过程中,无论它出现在顺序的。我知道,stderr是无缓冲的和stdout缓冲的,但这并不能解释为什么stdout信息在其自己的流中是无序的。此外,从我的帖子中不明显的是,所有stdout等待到执行结束才出现在屏幕上的事实。
我的问题如下:为什么会发生这种情况?而且,不太重要的是有没有办法解决它?
stdout 的输出是缓冲的:也就是说,print 语句实际上写入了一个缓冲区,而这个缓冲区只是偶尔刷新到终端。每个进程都有一个单独的缓冲区,这就是为什么来自不同进程的写入可能会出现乱序(这是一个常见问题,如为什么子进程stdout 到文件的写入乱序?)
在这种情况下,输出是有序的,但在重定向时出现乱序。为什么?这篇文章解释了:
- 标准输入总是被缓冲
- stderr 总是无缓冲的
- 如果 stdout 是终端,则缓冲自动设置为行缓冲,否则设置为缓冲
因此,当输出到终端时,它会刷新每一行,并且碰巧按顺序出现。重定向时,使用长缓冲区(通常为 4096 字节)。由于您打印的数量少于该数量,因此首先刷新先完成的子进程。
解决方案是使用flush()或完全禁用进程的缓冲(请参阅禁用输出缓冲)
| 归档时间: |
|
| 查看次数: |
1243 次 |
| 最近记录: |