确定 python 函数是否已更改

Question

确定 python 函数是否已更改

语境

我正在尝试在数据处理框架（kedro）中缓存执行。为此，我想为 python 函数开发一个唯一的哈希值，以确定函数体（或该函数调用的函数和模块）中的任何内容是否已更改。我调查了一下__code__.co_code。虽然这很好地忽略了注释、间距等，但当两个函数明显不同时它也不会改变。例如

def a():
  a = 1
  return a

def b():
  b = 2
  return b

assert a.__code__.co_code != b.__code__.co_code

Run Code Online (Sandbox Code Playgroud)

失败了。所以这两个函数的字节码是相等的。

最终目标：确定函数的代码或其任何数据输入是否已更改。如果不存在并且结果已经存在，则跳过执行以节省运行时间。

问题：如何获取 python 中函数代码的指纹？

同事提出的另一个想法是：

import dis

   def compare_instructions(func1, func2):
       """compatre instructions of two functions"""
       func1_instructions = list(dis.get_instructions(func1))
       func2_instructions = list(dis.get_instructions(func2))
 
       # compare every attribute of instructions except for starts_line
       for line1, line2 in zip(func1_instructions, func2_instructions):
           assert line1.opname == line2.opname
           assert line1.opcode == line2.opcode
           assert line1.arg == line2.arg
           assert line1.argval == line2.argval
           assert line1.argrepr == line2.argrepr
           assert line1.offset == line2.offset
  
       return True

Run Code Online (Sandbox Code Playgroud)

这看起来很像黑客。其他工具（例如pytest-testmon）也尝试解决此问题，但它们似乎使用了许多启发式方法。

Answer 1

Guy*_*emi 3

__code__.co_code 返回不引用常量的字节码。忽略函数中的常量，它们是相同的。

__code__.co_consts 包含有关常量的信息，因此需要在比较中加以考虑。

assert a.__code__.co_code != b.__code__.co_code \
       or a.__code__.co_consts != b.__code__.co_consts

Run Code Online (Sandbox Code Playgroud)

查看检查突出显示了“相同性”的其他一些考虑因素。例如，为了确保下面的函数被视为不同，必须考虑默认参数。

def a(a1, a2=1):
    return a1 * a2

def b(b1, b2=2):
    return b1 * b2

Run Code Online (Sandbox Code Playgroud)

指纹识别的一种方法是使用内置的哈希函数。假设与OP示例中的函数定义相同：

def finger_print(func):
    return hash(func.__code__.co_consts) + hash(func.__code__.co_code)

assert finger_print(a) != finger_print(b)

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，3 月前
查看次数：	598 次
最近记录：	4 年，5 月前