Twi*_*ell 0 awk python perl python3
我需要完成过滤日志文件中机器人活动的任务。
解决方案应仅显示满足以下条件的记录
输入数据示例
[a lot of data]
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged in| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user changed password| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged off| -
Mon, 22 Aug 2016 13:15:42 +0200|178.57.66.225|faaaaaa11111| - |user logged in| -
Mon, 22 Aug 2016 13:15:40 +0200|178.57.66.215|terdsfsdfsdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:49 +0200|178.57.66.215|terdsfsdfsdf| - |user changed password| -
Mon, 22 Aug 2016 13:15:49 +0200|178.57.66.215|terdsfsdfsdf| - |user logged off| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user changed password| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged off| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|abcbbabab| - |user logged in| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|abcbbabab| - |user changed password| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|abcbbabab| - |user changed profile| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|abcbbabab| - |user logged off| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged in| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user changed password| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged off| -
Mon, 22 Aug 2016 13:20:42 +0200|178.57.67.225|faaaa0a11111| - |user logged in| -
[a lot of data]
Run Code Online (Sandbox Code Playgroud)
我编写了下面的代码以完成任务
awk 'BEGIN { FS=" " } { c[$5]++; l[$5,c[$5]]=$0 } END { for (i in c) { if (c[i] == 3) for (j = 1 ; j <= c[i]; j++) print l[i,j] } }' $1
Run Code Online (Sandbox Code Playgroud)
用法:
./parse_log.sh 日志文件.log
输出:
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged in| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user changed password| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|fxsciaqulmlk| - |user logged off| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user changed password| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|erdsfsdfsdf| - |user logged off| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged in| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user changed password| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|fxsciaqulmla| - |user logged off| -
Run Code Online (Sandbox Code Playgroud)
但我想知道用 Perl 或 Python 编写的替代方案(外部库的使用最少)会是什么样子?
这不是一个答案,但它对于注释来说太大并且需要格式化,因此为了解决您的评论“Python 代码更容易阅读和理解它的作用。”,仅供参考,一个具有合理变量名称的 AWK 脚本我认为 你的Python脚本所做的事情看起来很像你的Python脚本,但更简短,因为为了操作文本,awk已经为你做了所有你必须在Python中编写代码才能完成的常见事情:
awk -v column=5 '
{ records[$column] = records[$column] $0 ORS }
END {
for ( timestamp in records ) {
if ( gsub(ORS,"&",records[timestamp]) > 2 ) {
printf "%s", records[timestamp]
}
}
}
' logfile.log
Run Code Online (Sandbox Code Playgroud)
但是在处理之前将整个文件读入内存是解决此问题的一种非常低效的方法。您应该在每次时间变化时进行测试并打印:
awk -v column=5 '
$column != prev {
prt()
records = ""
prev = $column
}
{ records = records $0 ORS }
END { prt() }
function prt() {
if ( gsub(ORS,"&",records) > 2 ) {
printf "%s", records
}
}
' logfile.log
Run Code Online (Sandbox Code Playgroud)
解决方案本身是用Python\xc2\xa03编写的。
\n#!/usr/bin/env python3\n\nimport sys\nimport re\nfrom collections import defaultdict\n\n\ncolumn_delimiter = sys.argv[1]\ncolumn = int(sys.argv[2]) - 1\n\nrecords = defaultdict(list)\n\nwith open(sys.argv[3]) as inputfile:\n for lines in inputfile:\n line = lines.rstrip('\\n')\n row_record = line.split(column_delimiter)\n records[row_record[column]].append(line)\n\nfor timestamps in records.values():\n if len(timestamps) == 3:\n for i in range(len(timestamps)):\n if (re.search('logged in|changed password|logged off', timestamps[i])):\n print(timestamps[i])\n
Run Code Online (Sandbox Code Playgroud)\n用法:parse_log.py ' ' 5 logfile.log
Python 代码更容易阅读和理解它的作用。
\n