打开文件,阅读内容,使用正则表达式将内容放入列表,然后在python中打印列表

Question

打开文件,阅读内容,使用正则表达式将内容放入列表,然后在python中打印列表

我正在使用"import re and sys"

在终端上,当我输入"1.py a.txt"时,我希望它读取"a.txt",其中包含以下内容:

17:18:42.525964 IP 66.185.85.146.80 > 192.168.0.15.34436: Flags [.], seq 1:1449, ack 2555, win 1320, options [nop,nop,TS val 3551057710 ecr 43002332], length 1448
17:18:42.526623 IP 66.185.85.146.80 > 192.168.0.15.34436: Flags [.], seq 1449:2897, ack 2555, win 1320, options [nop,nop,TS val 3551057710 ecr 43002332], length 1448
17:18:42.526900 IP 192.168.0.15.34436 > 66.185.85.146.80: Flags [.], ack 2897, win 1444, options [nop,nop,TS val 43002448 ecr 3551057710], length 0
17:18:42.527694 IP 66.185.85.146.80 > 192.168.0.15.34436: Flags [.], seq 2897:14481, ack 2555, win 1320, options [nop,nop,TS val 3551057710 ecr 43002332], length 11584
17:18:42.527716 IP 192.168.0.15.34436 > 66.185.85.146.80: Flags [.], ack 14481, win 1444, options [nop,nop,TS val 43002448 ecr 3551057710], length 0
17:18:42.528794 IP 66.185.85.146.80 > 192.168.0.15.34436: Flags [.], seq 14481:23169, ack 2555, win 1320, options [nop,nop,TS val 3551057710 ecr 43002332], length 8688
17:18:42.528813 IP 192.168.0.15.34436 > 66.185.85.146.80: Flags [.], ack 23169, win 1444, options [nop,nop,TS val 43002448 ecr 3551057710], length 0
17:18:42.545191 IP 192.168.0.15.60030 > 52.2.63.29.80: Flags [.], seq 4113773418:4113774866, ack 850072640, win 270, options [nop,nop,TS val 43002452 ecr 9849626], length 1448

Run Code Online (Sandbox Code Playgroud)

然后使用正则表达式删除除IP地址和长度(总数)之外的所有内容,并将其打印为:

source: 66.185.85.146 dest: 192.168.0.15 total:1448
source: 66.185.85.146 dest: 192.168.0.15 total:1448
source: 192.168.0.15 dest: 66.185.85.146 total:0

Run Code Online (Sandbox Code Playgroud)

但是如果有重复项,那么它将如下所示,它将添加重复项的总量:

source: 66.185.85.146 dest: 192.168.0.15 total:2896
source: 192.168.0.15 dest: 66.185.85.146 total:0

Run Code Online (Sandbox Code Playgroud)

此外,如果我在终端中键入"-s",如下所示:

"1.py -s a.txt"

Run Code Online (Sandbox Code Playgroud)

要么

"1.py a.txt -s 192.168.0.15"

Run Code Online (Sandbox Code Playgroud)

它应该排序,对于第一个-s,它将排序和打印内容,如果-s ip,则排序ips.

目前这是我对每个项目所拥有的,我想知道如何一起使用它们.

#!/usr/bin/python3
import re
import sys

file = sys.argv[1]
a = open(file, "r")

for line in a:
   line = line.rstrip()
   c = re.findall(r'^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$',line) #Yes I know its not the best regex for this, but I am testing it out for now
   d = re.findall(r'\b(\d+)$\b',line)

   if len(c) > 0 and len(d) > 0:
      print("source:", c[0],"\t","dest:",c[1],"\t", "total:",d[0])

Run Code Online (Sandbox Code Playgroud)

这就是我到目前为止,我不知道如何使用"-s"或如何排序,以及如何删除重复项,并在删除重复项时添加总计.

Answer 1

Bop*_*reH 2

要阅读，-s您可能需要一个库来解析参数，就像标准的argparse. 它允许您指定脚本需要哪些参数及其描述，并解析它们并确保它们的格式。

要对列表进行排序，有一个sorted(my_list)函数。

最后，为了确保没有重复，您可以使用set. 这会丢失列表排序，但由于您稍后对其进行排序，所以这应该不是问题。

或者，还有Counter专门用于添加分组值并对它们进行排序的集合。

from collections import Counter

results = Counter()

for line in a:
    line = line.rstrip()
    c = re.findall(r'^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$',line) #Yes I know its not the best regex for this, but I am testing it out for now
    d = re.findall(r'\b(\d+)$\b',line)

    if len(c) > 0 and len(d) > 0:
        source, destination, length = c[0], c[1], d[0]
        results[(source, destination)] += int(length)

# Print the sorted items.
for (source, destination), length in results.most_common():
    print("source:", source, "\t", "dest:", destination, "\t", "total:", length)

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，4 月前
查看次数：	76 次
最近记录：	10 年，4 月前