我怎样才能理解.pyc文件内容

Niy*_*n C 7 python disassembly pyc python-2.7

我有一个.pyc文件.我需要了解该文件的内容就知道反汇编是如何工作的蟒蛇,即我怎么能产生像输出dis.dis(function).pyc文件内容.

例如

>>> def sqr(x):  
...     return x*x
...
>>> import dis
>>> dis.dis(sqr)
  2           0 LOAD_FAST                0 (x)
              3 LOAD_FAST                0 (x)
              6 BINARY_MULTIPLY     
              7 RETURN_VALUE        
Run Code Online (Sandbox Code Playgroud)

我需要使用该.pyc文件获得这样的输出.

Mar*_*ers 19

.pyc文件包含一些元数据和marshaled code对象; 加载code对象并反汇编使用:

import dis, marshal, sys

# Header size changed in 3.3. It might change again, but as of this writing, it hasn't.
header_size = 12 if sys.version_info >= (3, 3) else 8

with open(pycfile, "rb") as f:
    magic_and_timestamp = f.read(header_size)  # first 8 or 12 bytes are metadata
    code = marshal.load(f)                     # rest is a marshalled code object

dis.dis(code)
Run Code Online (Sandbox Code Playgroud)

演示bisect模块:

>>> import bisect
>>> import dis, marshal
>>> import sys
>>> header_size = 12 if sys.version_info >= (3, 3) else 8
>>> with open(bisect.__file__, "rb") as f:
...     magic_and_timestamp = f.read(header_size)  # first 8 or 12 bytes are metadata
...     code = marshal.load(f)                     # rest is bytecode
... 
>>> dis.dis(code)
  1           0 LOAD_CONST               0 ('Bisection algorithms.')
              3 STORE_NAME               0 (__doc__)

  3           6 LOAD_CONST               1 (0)
              9 LOAD_CONST               8 (None)
             12 LOAD_CONST               2 (<code object insort_right at 0x106a459b0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 3>)
             15 MAKE_FUNCTION            2
             18 STORE_NAME               2 (insort_right)

 22          21 LOAD_NAME                2 (insort_right)
             24 STORE_NAME               3 (insort)

 24          27 LOAD_CONST               1 (0)
             30 LOAD_CONST               8 (None)
             33 LOAD_CONST               3 (<code object bisect_right at 0x106a45ab0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 24>)
             36 MAKE_FUNCTION            2
             39 STORE_NAME               4 (bisect_right)

 45          42 LOAD_NAME                4 (bisect_right)
             45 STORE_NAME               5 (bisect)

 47          48 LOAD_CONST               1 (0)
             51 LOAD_CONST               8 (None)
             54 LOAD_CONST               4 (<code object insort_left at 0x106a45bb0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 47>)
             57 MAKE_FUNCTION            2
             60 STORE_NAME               6 (insort_left)

 67          63 LOAD_CONST               1 (0)
             66 LOAD_CONST               8 (None)
             69 LOAD_CONST               5 (<code object bisect_left at 0x106a45cb0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 67>)
             72 MAKE_FUNCTION            2
             75 STORE_NAME               7 (bisect_left)

 89          78 SETUP_EXCEPT            14 (to 95)

 90          81 LOAD_CONST               6 (-1)
             84 LOAD_CONST               7 (('*',))
             87 IMPORT_NAME              8 (_bisect)
             90 IMPORT_STAR         
             91 POP_BLOCK           
             92 JUMP_FORWARD            17 (to 112)

 91     >>   95 DUP_TOP             
             96 LOAD_NAME                9 (ImportError)
             99 COMPARE_OP              10 (exception match)
            102 POP_JUMP_IF_FALSE      111
            105 POP_TOP             
            106 POP_TOP             
            107 POP_TOP             

 92         108 JUMP_FORWARD             1 (to 112)
        >>  111 END_FINALLY         
        >>  112 LOAD_CONST               8 (None)
            115 RETURN_VALUE        
Run Code Online (Sandbox Code Playgroud)

请注意,这只是定义模块的顶级代码对象.如果要分析包含的函数,则需要code从顶级code.co_consts数组加载嵌套对象; 例如,insort_right函数的代码对象被加载LOAD_CONST 2,因此在该索引处查找代码对象:

>>> code.co_consts[2]
<code object insort_right at 0x106a459b0, file "/Users/mpieters/Development/Library/buildout.python/parts/opt/lib/python2.7/bisect.py", line 3>
>>> dis.dis(code.co_consts[2])
 12           0 LOAD_FAST                2 (lo)
              3 LOAD_CONST               1 (0)
              6 COMPARE_OP               0 (<)
              9 POP_JUMP_IF_FALSE       27

 13          12 LOAD_GLOBAL              0 (ValueError)
             15 LOAD_CONST               2 ('lo must be non-negative')
             18 CALL_FUNCTION            1
             21 RAISE_VARARGS            1
             24 JUMP_FORWARD             0 (to 27)

 14     >>   27 LOAD_FAST                3 (hi)
             30 LOAD_CONST               5 (None)
             33 COMPARE_OP               8 (is)
             36 POP_JUMP_IF_FALSE       54

 15          39 LOAD_GLOBAL              2 (len)
             42 LOAD_FAST                0 (a)
             45 CALL_FUNCTION            1
             48 STORE_FAST               3 (hi)
             51 JUMP_FORWARD             0 (to 54)

 16     >>   54 SETUP_LOOP              65 (to 122)
        >>   57 LOAD_FAST                2 (lo)
             60 LOAD_FAST                3 (hi)
             63 COMPARE_OP               0 (<)
             66 POP_JUMP_IF_FALSE      121

 17          69 LOAD_FAST                2 (lo)
             72 LOAD_FAST                3 (hi)
             75 BINARY_ADD          
             76 LOAD_CONST               3 (2)
             79 BINARY_FLOOR_DIVIDE 
             80 STORE_FAST               4 (mid)

 18          83 LOAD_FAST                1 (x)
             86 LOAD_FAST                0 (a)
             89 LOAD_FAST                4 (mid)
             92 BINARY_SUBSCR       
             93 COMPARE_OP               0 (<)
             96 POP_JUMP_IF_FALSE      108
             99 LOAD_FAST                4 (mid)
            102 STORE_FAST               3 (hi)
            105 JUMP_ABSOLUTE           57

 19     >>  108 LOAD_FAST                4 (mid)
            111 LOAD_CONST               4 (1)
            114 BINARY_ADD          
            115 STORE_FAST               2 (lo)
            118 JUMP_ABSOLUTE           57
        >>  121 POP_BLOCK           

 20     >>  122 LOAD_FAST                0 (a)
            125 LOAD_ATTR                3 (insert)
            128 LOAD_FAST                2 (lo)
            131 LOAD_FAST                1 (x)
            134 CALL_FUNCTION            2
            137 POP_TOP             
            138 LOAD_CONST               5 (None)
            141 RETURN_VALUE        
Run Code Online (Sandbox Code Playgroud)

我个人会避免尝试.pyc用匹配的Python版本和marshal模块之外的任何东西来解析文件.所述marshal格式基本上是一个内部序列化格式,与Python自身的需要而改变.像列表推导和新功能,with报表和async/ await需要新增加的格式,这是不超过所公布的其他C源代码.

如果你选择这条路线,并设法通过除使用模块之外的其他方式读取code对象,则必须从代码对象的各种属性中解析出反汇编; 有关如何执行此操作的详细信息,请参阅dis模块源(例如,您必须使用co_firstlinenoco_lnotab属性创建字节码 - 偏移到亚麻布图).

  • 有一篇关于 [`pyc 文件格式`](http://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html) 的简短博客文章。但是我肯定要完全理解所有这些,你必须深入研究 Python 的源代码(这方面有很多很好的文档)。 (2认同)