通过readinto()将二进制数据解析为ctypes结构对象

mon*_*kut 7 c python ctypes data-structures

我正在尝试处理二进制格式,遵循以下示例:

http://dabeaz.blogspot.jp/2009/08/python-binary-io-handling.html

>>> from ctypes import *
>>> class Point(Structure):
>>>     _fields_ = [ ('x',c_double), ('y',c_double), ('z',c_double) ]
>>>
>>> g = open("foo","rb") # point structure data
>>> q = Point()
>>> g.readinto(q)
24
>>> q.x
2.0
Run Code Online (Sandbox Code Playgroud)

我已经定义了我的标题的结构,我正在尝试将数据读入我的结构,但我遇到了一些困难.我的结构是这样的:

class BinaryHeader(BigEndianStructure):
    _fields_ = [
                ("sequence_number_4bytes", c_uint),
                ("ascii_text_32bytes", c_char),
                ("timestamp_4bytes", c_uint),
                ("more_funky_numbers_7bytes", c_uint, 56),
                ("some_flags_1byte", c_byte),
                ("other_flags_1byte", c_byte),
                ("payload_length_2bytes", c_ushort),

                ] 
Run Code Online (Sandbox Code Playgroud)

ctypes的文档说:

对于像c_int这样的整数类型字段,可以给出第三个可选项.它必须是一个小的正整数,用于定义字段的位宽.

因此,("more_funky_numbers_7bytes", c_uint, 56),我试图将字段定义为7字节字段,但我收到错误:

ValueError:位字段无效的位数

所以我的第一个问题是,如何定义一个7字节的int字段?

然后,如果我跳过该问题并注释掉"more_funky_numbers_7bytes"字段,结果数据将被加载...但正如预期的那样,只有1个字符被加载到"ascii_text_32bytes"中.由于某种原因16,我假设返回的是它读入结构的计算字节数...但如果我正在评论我的"时髦数字"字段而"ascii_text_32bytes"只给出一个字符(1字节),不应该是13,而不是16 ???

然后我尝试将char字段分解为一个单独的结构,并从我的Header结构中引用它.但那也不起作用......

class StupidStaticCharField(BigEndianStructure):
    _fields_ = [
                ("ascii_text_1", c_byte),
                ("ascii_text_2", c_byte),
                ("ascii_text_3", c_byte),
                ("ascii_text_4", c_byte),
                ("ascii_text_5", c_byte),
                ("ascii_text_6", c_byte),
                ("ascii_text_7", c_byte),
                ("ascii_text_8", c_byte),
                ("ascii_text_9", c_byte),
                ("ascii_text_10", c_byte),
                ("ascii_text_11", c_byte),
                .
                .
                .
                ]

class BinaryHeader(BigEndianStructure):
    _fields_ = [
                ("sequence_number_4bytes", c_uint),
                ("ascii_text_32bytes", StupidStaticCharField),
                ("timestamp_4bytes", c_uint),
                #("more_funky_numbers_7bytes", c_uint, 56),
                ("some_flags_1byte", c_ushort),
                ("other_flags_1byte", c_ushort),
                ("payload_length_2bytes", c_ushort),

                ] 
Run Code Online (Sandbox Code Playgroud)

所以,任何想法如何:

  1. 定义一个7字节的字段(我需要使用定义的函数进行解码)
  2. 定义32字节的静态char字段

UPDATE

我找到了一个似乎有用的结构......

class BinaryHeader(BigEndianStructure):
    _fields_ = [
                ("sequence_number_4bytes", c_uint),
                ("ascii_text_32bytes", c_char * 32),
                ("timestamp_4bytes", c_uint),
                ("more_funky_numbers_7bytes", c_byte * 7),
                ("some_flags_1byte", c_byte),
                ("other_flags_1byte", c_byte),
                ("payload_length_2bytes", c_ushort),

                ]  
Run Code Online (Sandbox Code Playgroud)

然而,现在,我剩下的问题是,为什么在使用时.readinto():

f = open(binaryfile, "rb")

mystruct = BinaryHeader()
f.readinto(mystruct)
Run Code Online (Sandbox Code Playgroud)

它正在回归52而不是预期的51.来自哪个额外字节,它在哪里?

UPDATE 2 对于那些有兴趣这里是一个例子的替代的struct读取值到由eryksun提及namedtuple方法:

>>> record = 'raymond   \x32\x12\x08\x01\x08'
>>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)

>>> from collections import namedtuple
>>> Student = namedtuple('Student', 'name serialnum school gradelevel')
>>> Student._make(unpack('<10sHHb', record))
Student(name='raymond   ', serialnum=4658, school=264, gradelevel=8)
Run Code Online (Sandbox Code Playgroud)

小智 6

该行定义实际上用于定义位域:

...
("more_funky_numbers_7bytes", c_uint, 56),
...
Run Code Online (Sandbox Code Playgroud)

哪里错了.位域的大小应小于或等于类型的大小,因此c_uint最多应为32,一个额外的位将引发异常:

ValueError: number of bits invalid for bit field
Run Code Online (Sandbox Code Playgroud)

使用位域的示例:

from ctypes import *

class MyStructure(Structure):
    _fields_ = [
        # c_uint8 is 8 bits length
        ('a', c_uint8, 4), # first 4 bits of `a`
        ('b', c_uint8, 2), # next 2 bits of `a`
        ('c', c_uint8, 2), # next 2 bits of `a`
        ('d', c_uint8, 2), # since we are beyond the size of `a`
                           # new byte will be create and `d` will
                           # have the first two bits
    ]

mystruct = MyStructure()

mystruct.a = 0b0000
mystruct.b = 0b11
mystruct.c = 0b00
mystruct.d = 0b11

v = c_uint16()

# copy `mystruct` into `v`, I use Windows
cdll.msvcrt.memcpy(byref(v), byref(mystruct), sizeof(v))

print sizeof(mystruct) # 2 bytes, so 6 bits are left floating, you may
                       # want to memset with zeros
print bin(v.value)     # 0b1100110000
Run Code Online (Sandbox Code Playgroud)

你需要的是7个字节,所以你最终做的是正确的:

...
("more_funky_numbers_7bytes", c_byte * 7),
...
Run Code Online (Sandbox Code Playgroud)

至于结构的大小,它将是52,我将填充额外的字节以对齐 32位处理器上的4个字节或64位上的8个字节的结构.这里:

from ctypes import *

class BinaryHeader(BigEndianStructure):
    _fields_ = [
        ("sequence_number_4bytes", c_uint),
        ("ascii_text_32bytes", c_char * 32),
        ("timestamp_4bytes", c_uint),
        ("more_funky_numbers_7bytes", c_byte * 7),
        ("some_flags_1byte", c_byte),
        ("other_flags_1byte", c_byte),
        ("payload_length_2bytes", c_ushort),
    ]

mystruct = BinaryHeader(
    0x11111111,
    '\x22' * 32,
    0x33333333,
    (c_byte * 7)(*([0x44] * 7)),
    0x55,
    0x66,
    0x7777
)

print sizeof(mystruct)

with open('data.txt', 'wb') as f:
    f.write(mystruct)
Run Code Online (Sandbox Code Playgroud)

额外字节填充在文件之间other_flags_1bytepayload_length_2bytes文件中:

00000000 11 11 11 11 ....
00000004 22 22 22 22 """"
00000008 22 22 22 22 """"
0000000C 22 22 22 22 """"
00000010 22 22 22 22 """"
00000014 22 22 22 22 """"
00000018 22 22 22 22 """"
0000001C 22 22 22 22 """"
00000020 22 22 22 22 """"
00000024 33 33 33 33 3333
00000028 44 44 44 44 DDDD
0000002C 44 44 44 55 DDDU
00000030 66 00 77 77 f.ww
            ^
         extra byte
Run Code Online (Sandbox Code Playgroud)

在涉及文件格式和网络协议时,这是一个问题.要改变它打包1:

 ...
class BinaryHeader(BigEndianStructure):
    _pack_ = 1
    _fields_ = [
        ("sequence_number_4bytes", c_uint),
...
Run Code Online (Sandbox Code Playgroud)

该文件将是:

00000000 11 11 11 11 ....
00000004 22 22 22 22 """"
00000008 22 22 22 22 """"
0000000C 22 22 22 22 """"
00000010 22 22 22 22 """"
00000014 22 22 22 22 """"
00000018 22 22 22 22 """"
0000001C 22 22 22 22 """"
00000020 22 22 22 22 """"
00000024 33 33 33 33 3333
00000028 44 44 44 44 DDDD
0000002C 44 44 44 55 DDDU
00000030 66 77 77    fww 
Run Code Online (Sandbox Code Playgroud)

至于struct它,在你的情况下它不会更容易.遗憾的是,它不支持格式化的嵌套元组.例如这里:

>>> from struct import *
>>>
>>> data = '\x11\x11\x11\x11\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22
\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x22\x33
\x33\x33\x33\x44\x44\x44\x44\x44\x44\x44\x55\x66\x77\x77'
>>>
>>> BinaryHeader = Struct('>I32cI7BBBH')
>>>
>>> BinaryHeader.unpack(data)
(286331153, '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"'
, '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"', '"'
, '"', '"', 858993459, 68, 68, 68, 68, 68, 68, 68, 85, 102, 30583)
>>>
Run Code Online (Sandbox Code Playgroud)

无法使用此结果namedtuple,您仍然可以根据索引进行解析.如果你可以做类似的事情,它会工作'>I(32c)(I)(7B)(B)(B)H'.自2003年以来,此处已请求此功能(扩展struct.unpack以生成嵌套元组),但此后没有任何操作.