如何在Python中拖尾日志文件？

Question

如何在Python中拖尾日志文件？

Eli*_*Eli 65 python tail

我想在Python中输出tail -F或类似的东西,而不会阻塞或锁定.我发现一些真正的旧代码到做在这里,但我想一定有更好的方法或通过图书馆现在做同样的事情.谁知道一个？

理想情况下,tail.getNewData()每次我想要更多数据时,我都会有类似的东西.

Answer 1

Mat*_*att 60

非阻塞

如果你在Linux上(因为windows不支持调用select on files),你可以使用subprocess模块和select模块.

import time
import subprocess
import select

f = subprocess.Popen(['tail','-F',filename],\
        stdout=subprocess.PIPE,stderr=subprocess.PIPE)
p = select.poll()
p.register(f.stdout)

while True:
    if p.poll(1):
        print f.stdout.readline()
    time.sleep(1)

Run Code Online (Sandbox Code Playgroud)

这将轮询输出管道以获取新数据,并在可用时将其打印出来.通常情况下time.sleep(1),print f.stdout.readline()将替换为有用的代码.

闭塞

您可以使用子进程模块而无需额外的选择模块调用.

import subprocess
f = subprocess.Popen(['tail','-F',filename],\
        stdout=subprocess.PIPE,stderr=subprocess.PIPE)
while True:
    line = f.stdout.readline()
    print line

Run Code Online (Sandbox Code Playgroud)

这也将在添加时打印新行,但它会阻塞,直到尾程序关闭,可能是f.kill().

这每秒最多只能读取一行，如果日志每秒增长超过一行，这是一个问题。 (3认同)
在"阻塞"解决方案中,使用`sys.stdout.write(line)`代替'print line`来处理打印将插入的额外换行符. (2认同)
@mork 是否打印了不应该打印的额外换行符？无论如何，我相信 `.strip()` 也会删除可能很重要的前导空格。 (2认同)

Answer 2

Pau*_*ine 37

使用sh模块(pip install sh):

from sh import tail
# runs forever
for line in tail("-f", "/var/log/some_log_file.log", _iter=True):
    print(line)

Run Code Online (Sandbox Code Playgroud)

[更新]

由于sh.tail with _iter= True是一个生成器,你可以:

import sh
tail = sh.tail("-f", "/var/log/some_log_file.log", _iter=True)

Run Code Online (Sandbox Code Playgroud)

然后你可以用"getNewData":

new_data = tail.next()

Run Code Online (Sandbox Code Playgroud)

请注意,如果尾部缓冲区为空,它将阻塞,直到有更多数据(从您的问题来看,在这种情况下您不清楚您想要做什么).

[更新]

如果你用-F替换-f,这是有效的,但在Python中它会锁定.如果可能的话,我会更有兴趣拥有一个我可以调用以获取新数据的函数. - 伊莱

容器生成器将尾调用置于一个True循环内并捕获最终的I/O异常将具有与-F几乎相同的效果.

def tail_F(some_file):
    while True:
        try:
            for line in sh.tail("-f", some_file, _iter=True):
                yield line
        except sh.ErrorReturnCode_1:
            yield None

Run Code Online (Sandbox Code Playgroud)

如果文件变得不可访问,则生成器将返回None.但是,如果文件可访问,它仍会阻塞,直到有新数据.对于我来说,在这种情况下你想做什么仍然不清楚.

Raymond Hettinger的方法似乎很不错:

def tail_F(some_file):
    first_call = True
    while True:
        try:
            with open(some_file) as input:
                if first_call:
                    input.seek(0, 2)
                    first_call = False
                latest_data = input.read()
                while True:
                    if '\n' not in latest_data:
                        latest_data += input.read()
                        if '\n' not in latest_data:
                            yield ''
                            if not os.path.isfile(some_file):
                                break
                            continue
                    latest_lines = latest_data.split('\n')
                    if latest_data[-1] != '\n':
                        latest_data = latest_lines[-1]
                    else:
                        latest_data = input.read()
                    for line in latest_lines[:-1]:
                        yield line + '\n'
        except IOError:
            yield ''

Run Code Online (Sandbox Code Playgroud)

如果文件无法访问或没有新数据,此生成器将返回''.

[更新]

倒数第二个答案围绕文件的顶部,只要数据耗尽就会出现. - 伊莱

我认为只要尾部进程结束,第二行就会输出最后十行,-f只要有I/O错误,就会输出.该tail --follow --retry行为与此不远处的我能想到的类Unix环境中的大多数情况下.

也许如果你更新你的问题来解释你的真正目标是什么(你想要模仿尾部的原因),你会得到一个更好的答案.

最后一个答案实际上并没有遵循尾部,只是在运行时读取可用的内容. - 伊莱

当然,tail会默认显示最后10行...你可以使用file.seek将文件指针放在文件的末尾,我会给读者留下一个适当的练习作为练习.

恕我直言,file.read()方法比基于子进程的解决方案更优雅.

Answer 3

nne*_*neo 22

实际上,文件的唯一可移植方式tail -f似乎是从中读取并重试(在a之后sleep)read返回0. tail各种平台上的实用程序使用特定kqueue于平台的技巧(例如在BSD上)来永久地有效地拖尾文件不需要sleep.

因此,tail -f纯粹用Python 实现一个好东西可能不是一个好主意,因为你必须使用最小公分母实现(不依赖于特定于平台的黑客攻击).使用简单的方法在单独的线程中subprocess打开tail -f并迭代这些行,您可以tail在Python中轻松实现非阻塞操作.

示例实现:

import threading, Queue, subprocess
tailq = Queue.Queue(maxsize=10) # buffer at most 100 lines

def tail_forever(fn):
    p = subprocess.Popen(["tail", "-f", fn], stdout=subprocess.PIPE)
    while 1:
        line = p.stdout.readline()
        tailq.put(line)
        if not line:
            break

threading.Thread(target=tail_forever, args=(fn,)).start()

print tailq.get() # blocks
print tailq.get_nowait() # throws Queue.Empty if there are no lines to read

Run Code Online (Sandbox Code Playgroud)

英语不是我的本土习语,但我想可以从问题标题中推断出来(我怎样才能在Python中找到**日志文件**？). (10认同)
如果OP主要关注的是没有摆脱对外部命令(尾部)的依赖,他应该遵循unix传统的编写日志处理器应用程序来从stdin读取并将`tail -F`管道化.我不明白为什么增加线程,队列和子进程的复杂性会比传统方法带来任何优势. (4认同)

Answer 4

Isa*_*ner 17

使用非阻塞 readline() 的纯 pythonic 解决方案

将 Ijaz Ahmad Khan 的答案改编为仅在完全写入时产生行（行以换行符结尾）给出了一个没有外部依赖项的 pythonic 解决方案：

def follow(file, sleep_sec=0.1) -> Iterator[str]:
    """ Yield each line from a file as they are written.
    `sleep_sec` is the time to sleep after empty reads. """
    line = ''
    while True:
        tmp = file.readline()
        if tmp is not None:
            line += tmp
            if line.endswith("\n"):
                yield line
                line = ''
        else if sleep_sec:
            time.sleep(sleep_sec)


if __name__ == '__main__':
    with open("test.txt", 'r') as file:
        for line in follow(file):
            print(line, end='')

Run Code Online (Sandbox Code Playgroud)

@creativecoding 这个答案确实比之前建议生成 `tail -f` 实例的任何答案都要好。已投赞成票。 (3认同)
Iljaz Ahmad 和这个解决方案不仅更具 Python 风格，而且还可以防止生成新进程，这样可以节省资源，并且可以根据情况更好地扩展。 (2认同)
“else if”不是 Python - 编辑并同意这比“sh”和“tail”更好 (2认同)
文件空闲时负载较高，但可以通过将 `if tmp is not None` 更改为 `if tmp != ""` 轻松修复。 (2认同)

Answer 5

Eli*_*Eli 12

所以,这已经很晚了,但我又遇到了同样的问题,现在有一个更好的解决方案.只需使用pygtail:

Pygtail读取尚未读取的日志文件行.它甚至可以处理已旋转的日志文件.基于logcheck的logtail2(http://logcheck.org)

Answer 6

Ija*_*han 12

所有使用 tail -f 的答案都不是 Pythonic。

这是pythonic方式：（不使用外部工具或库）

def follow(thefile):
     while True:
        line = thefile.readline()
        if not line or not line.endswith('\n'):
            time.sleep(0.1)
            continue
        yield line



if __name__ == '__main__':
    logfile = open("run/foo/access-log","r")
    loglines = follow(logfile)
    for line in loglines:
        print(line, end='')

Run Code Online (Sandbox Code Playgroud)

如果日志文件附加在 2 个系统调用中，这种“跟随”文件的方式有时会返回该行的 2 部分，而不是整行本身 (2认同)

Answer 7

Ray*_*ger 9

理想情况下,我有类似tail.getNewData()的东西,每当我想要更多数据时我都可以调用它

我们已经有了一个非常好.只要你想要更多数据, 就可以调用f.read().它将开始读取上一次读取停止的位置,并将读取数据流的末尾:

f = open('somefile.log')
p = 0
while True:
    f.seek(p)
    latest_data = f.read()
    p = f.tell()
    if latest_data:
        print latest_data
        print str(p).center(10).center(80, '=')

Run Code Online (Sandbox Code Playgroud)

要逐行阅读,请使用f.readline().有时,正在读取的文件将以部分读取的行结束.处理该情况,使用f.tell()查找当前文件位置并使用f.seek()将文件指针移回不完整行的开头.有关工作代码,请参阅此ActiveState配方.

@Paulo:这是答案中缺少的重要信息.如果没有指定操作系统,则构建一般工作的东西,或至少适用于*nix的东西.你永远不会假设Windows. (8认同)

Answer 8

Har*_*_OK 6

您可以使用“ tailer”库：https : //pypi.python.org/pypi/tailer/

它具有获取最后几行的选项：

# Get the last 3 lines of the file
tailer.tail(open('test.txt'), 3)
# ['Line 9', 'Line 10', 'Line 11']

Run Code Online (Sandbox Code Playgroud)

它也可以跟随一个文件：

# Follow the file as it grows
for line in tailer.follow(open('test.txt')):
    print line

Run Code Online (Sandbox Code Playgroud)

如果有人想要像尾巴一样的行为，那似乎是一个不错的选择。

Answer 9

Ken*_*tzo 5

另一个选择是tailhead提供 Python 版本的库tail以及head可在您自己的模块中使用的实用程序和 API。

最初基于该tailer模块，其主要优点是能够通过路径跟踪文件，即它可以处理重新创建文件时的情况。此外，它还针对各种边缘情况修复了一些错误。

归档时间：	13 年，4 月前
查看次数：	106470 次
最近记录：	6 年，11 月前