在Python中连续解析文件

Question

在Python中连续解析文件

我正在编写一个脚本来解析带有HTTP流量线的文件,并取出域并且当前只是将它们打印到屏幕上.我正在使用httpry将流量连续写入文件.这是我用来删除域名的脚本

#!/usr/bin/python

import re

input = open("results.txt","r")

for line in input:
    domain = line.split()[6]
    if domain != "-":
        print domain

Run Code Online (Sandbox Code Playgroud)

虽然这个脚本运行良好,但我想要一种连续运行此脚本的方法,以便在将新流量添加到输入文件时,脚本可以将其删除.我不能只在httpry的输出上运行awk,因为我最终会将这些域输入到Mongo数据库中,我也需要脚本来执行此操作.如果有人能给我一些想法如何在输出上不断运行这个python脚本,但不重新打印以前的条目,那将非常感激.谢谢.

Answer 1

Mat*_*son 6

试试这个tail -f实现,如http://code.activestate.com/recipes/157035-tail-f-in-python/

import time

while 1:
    where = file.tell()
    line = file.readline()
    if not line:
        time.sleep(1)
        file.seek(where)
    else:
        print line, # already has newline

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，1 月前
查看次数：	3162 次
最近记录：	13 年，1 月前