Python:将tcpdump转换为text2pcap可读格式

Tar*_*oko 5 python regex networking tcpdump pcap

我编写了一个Python脚本,将 的文本输出转换tcpdump -i eth0 -neXXs0text2pcap. 这是我的第一个Python程序,我正在寻找建议来提高其效率、可读性或代码中的任何潜在差异。

我正在使用的输出格式tcpdump如下所示:

20:11:32.001190 00:16:76:7f:2b:b1 > 00:11:5c:78:ca:c0, ethertype IPv4 (0x0800), length 72: 123.236.188.140.41756 > 94.59.34.210.45931: UDP, length 30
    
    0x0000:  0011 5c78 cac0 0016 767f 2bb1 0800 4500  ..\x....v.+...E.
    0x0010:  003a 0000 4000 4011 812d 7bec bc8c 5e3b  .:..@.@..-{...^;
    0x0020:  22d2 a31c b36b 0026 b9bd 2033 6890 ad33  "....k.&...3h..3
    0x0030:  e845 4b8d 2ba1 0685 0cb3 70dd 9b98 76d8  .EK.+.....p...v.
    0x0040:  8fc6 8293 bf33 325a                      .....32Z
Run Code Online (Sandbox Code Playgroud)

输出

可以理解的格式text2pcap

20:11:32.001190 

    0000: 00 11 5c 78 ca c0 00 16 76 7f 2b b1 08 00 45 00   ..\x....v.+...E. 
    0010: 00 3a 00 00 40 00 40 11 81 2d 7b ec bc 8c 5e 3b   .:..@.@..-{...^; 
    0020: 22 d2 a3 1c b3 6b 00 26 b9 bd 20 33 68 90 ad 33   "....k.&...3h..3
    0030: e8 45 4b 8d 2b a1 06 85 0c b3 70 dd 9b 98 76 d8   .EK.+.....p...v. 
    0040: 8f c6 82 93 bf 33 32 5a   .....32Z 
Run Code Online (Sandbox Code Playgroud)

以下是我的代码:

import re

# Identify time of the current packet.
time = re.compile('(..:..:..\.[\w]*) ')
# Get individual elements from the packet. ie. offset, hexdump, chars
all = re.compile('[ |\t]+0x([\w]+:) +(.+)  +(.*)')
# Regex for two spaces
twoSpaces = re.compile('  +')
# Regex for single space
singleSpace = re.compile(' ')
# Single byte pattern.
singleBytePattern = re.compile(r'([\w][\w])')

# Open files.
f = open('pcap.txt', 'r')
outfile = open('ashu.txt', 'w')

for line in f:
    result = time.match(line)
    if result:
        # If current line contains time format dump only time
        print(result.group())
        outfile.write(result.group() + '\n')
    else:
        print(line)
        # Split line containing hex dump and tokenize into list elements.
        result = all.split(line)
        if result:
            i = 0
            for values in result:
                if i == 2:
                    # Strip off additional spaces in hex dump
                    # Useful when hex dump does not end in 16 bytes boundary.
                    val = twoSpaces.sub('', values)

                    # Tokenize individual elements separated by single space.
                    byteResult = singleSpace.split(val)
                    for twoByte in byteResult:
                        # Identify individual byte
                        singleByte = singleBytePattern.split(twoByte)
                        byteOffset = 0
                        for oneByte in singleByte:
                            if byteOffset == 1 or byteOffset == 3:
                                # Write out individual byte with a space char appended
                                print(oneByte, end=' ')
                                outfile.write(oneByte + ' ')
                            byteOffset += 1
                elif i == 3:
                    # Write of char format of hex dump
                    print("  " + values, end='')
                    outfile.write('  ' + values + ' ')
                elif i == 4:
                    outfile.write(values)
                else:
                    print(values, end=' ')
                    outfile.write(values + ' ')
                i += 1
        else:
            print("could not split")
f.close()
outfile.close()
Run Code Online (Sandbox Code Playgroud)

gho*_*g74 3

使用写入 pcap 格式文件-w 的选项tcpdump

tcpdump -w filename.pcap
Run Code Online (Sandbox Code Playgroud)

Wireshark 应该能够读取它。