小编IvD*_*ogg的帖子

Shell 脚本 - Awk 优化

我正在寻找一些帮助来尝试优化兄弟网络日志解析脚本，这是背景：

我有大量的兄弟日志，但我只对查询我范围内的 IP（多个可变长度子网）感兴趣。

所以我有一个带有正则表达式模式的文本文件来匹配我正在寻找的 IP 范围：scope.txt：

/^10\.0\.0\.([8-9]|[1-3][0-9]|4[0-5])$/

Run Code Online (Sandbox Code Playgroud)

（scope.txt 在正则表达式模式中包含最多 20 行其他 IP 范围）findInScope.sh：

#!bin/sh
for file in /data/bro_logs/2016-11-26/conn.*.log.gz
do
    echo "$file"
    touch /tmp/$file
    for nets in $(cat scope.txt)
    do
        echo "$nets"
        zcat $file | bro-cut -d | awk '$3 ~ '$nets' || $5 ~ '$nets'' >> /tmp/$file
    done
    sort /tmp/$file | uniq > ~/$file
    rm /tmp/$file
done

Run Code Online (Sandbox Code Playgroud)

作为更多背景，原始bro conn日志每小时大约100MB，因此我当前的脚本大约需要10-20分钟来解析一小时的日志数据。一天的日志最多可能需要 3 小时。

我想过一个带有 40 个 or 的 awk 语句，但决定我不想这样做，因为我想要一个单独的 scope.txt 文件，以便对不同范围的 IP 范围使用相同的脚本。

我还在多个 conn.log 文件（即 zcat conn.*.log.gz）上尝试了 zcat，但输出文件最终超过 1GB，我想保持每小时日志完整。

shell awk

IvD*_*ogg

2017 01-30

3
推荐指数

1
解决办法

696
查看次数