我可以在Python中恢复其闭包含循环的函数吗?

Eml*_*gan 6 python google-app-engine closures

我正在尝试序列化Python函数(代码+闭包),并在以后恢复它们.我正在使用这篇文章底部的代码.

这是非常灵活的代码.它允许内部函数的序列化和反序列化,以及闭包函数,例如需要恢复其上下文的函数:

def f1(arg):
    def f2():
        print arg

    def f3():
        print arg
        f2()

    return f3

x = SerialiseFunction(f1(stuff)) # a string
save(x) # save it somewhere

# later, possibly in a different process

x = load() # get it from somewhere 
newf2 = DeserialiseFunction(x)
newf2() # prints value of "stuff" twice
Run Code Online (Sandbox Code Playgroud)

即使函数的闭包中有函数,闭包中的函数等等,这些调用也会起作用(我们有一个闭包图,其中闭包包含具有包含更多函数的闭包的函数等等).

但是,事实证明这些图表可以包含循环:

def g1():
    def g2():
        g2()
    return g2()

g = g1()
Run Code Online (Sandbox Code Playgroud)

如果我看看g2关闭(通过g),我可以看到g2它:

>>> g
<function g2 at 0x952033c>
>>> g.func_closure[0].cell_contents
<function g2 at 0x952033c>
Run Code Online (Sandbox Code Playgroud)

当我尝试反序列化函数时,这会导致严重的问题,因为一切都是不可变的.我需要做的是使功能newg2:

newg2 = types.FunctionType(g2code, globals, closure=newg2closure)
Run Code Online (Sandbox Code Playgroud)

在哪里newg2closure创建如下:

newg2closure = (make_cell(newg2),)
Run Code Online (Sandbox Code Playgroud)

这当然不能做; 每行代码都依赖于另一行.单元格是不可变的,元组是不可变的,函数类型是不可变的.

所以我想要找到的是,有没有办法在newg2上面创建?有没有什么方法可以创建一个函数类型对象,在其自己的闭包图中提到该对象?

我正在使用python 2.7(我在App Engine上,所以我不能去Python 3).


作为参考,我的序列化功能:

def SerialiseFunction(aFunction):
    if not aFunction or not isinstance(c, types.FunctionType):
        raise Exception ("First argument required, must be a function")

    def MarshalClosureValues(aClosure):
        logging.debug(repr(aClosure))
        lmarshalledClosureValues = []
        if aClosure:
            lclosureValues = [lcell.cell_contents for lcell in aClosure]
            lmarshalledClosureValues = [
                [marshal.dumps(litem.func_code), MarshalClosureValues(litem.func_closure)] if hasattr(litem, "func_code")
                else [marshal.dumps(litem)] 
                for litem in lclosureValues
            ]
        return lmarshalledClosureValues

    lmarshalledFunc = marshal.dumps(aFunction.func_code)
    lmarshalledClosureValues = MarshalClosureValues(aFunction.func_closure)
    lmoduleName = aFunction.__module__

    lcombined = (lmarshalledFunc, lmarshalledClosureValues, lmoduleName)

    retval = marshal.dumps(lcombined)

    return retval


def DeserialiseFunction(aSerialisedFunction):
    lmarshalledFunc, lmarshalledClosureValues, lmoduleName = marshal.loads(aSerialisedFunction)

    lglobals = sys.modules[lmoduleName].__dict__

    def make_cell(value):
        return (lambda x: lambda: x)(value).func_closure[0]

    def UnmarshalClosureValues(aMarshalledClosureValues):
        lclosure = None
        if aMarshalledClosureValues:
            lclosureValues = [
                    marshal.loads(item[0]) if len(item) == 1 
                    else types.FunctionType(marshal.loads(item[0]), lglobals, closure=UnmarshalClosureValues(item[1])) 
                    for item in aMarshalledClosureValues if len(item) >= 1 and len(item) <= 2
                ]
            lclosure = tuple([make_cell(lvalue) for lvalue in lclosureValues])
        return lclosure

    lfunctionCode = marshal.loads(lmarshalledFunc)
    lclosure = UnmarshalClosureValues(lmarshalledClosureValues)
    lfunction = types.FunctionType(lfunctionCode, lglobals, closure=lclosure)
    return lfunction
Run Code Online (Sandbox Code Playgroud)

Eml*_*gan 3

这是一个有效的方法。

您无法修复这些不可变对象,但您可以做的是使用代理函数代替循环引用,并让它们在全局字典中查找真正的函数。

1:连载时,记录下你见过的所有函数。如果您再次看到相同的值,请不要重新序列化,而是序列化哨兵值。

我用过一套:

lfunctionHashes = set()
Run Code Online (Sandbox Code Playgroud)

对于每个序列化的项目,检查它是否在集合中,如果是,则使用哨兵,否则将其添加到集合中并正确编组:

lhash = hash(litem)
if lhash in lfunctionHashes:
    lmarshalledClosureValues.append([lhash, None])
else:
    lfunctionHashes.add(lhash)
    lmarshalledClosureValues.append([lhash, marshal.dumps(litem.func_code), MarshalClosureValues(litem.func_closure, lfullIndex), litem.__module__])
Run Code Online (Sandbox Code Playgroud)

2:反序列化时,保留一个functionhash的全局字典:function

gfunctions = {}
Run Code Online (Sandbox Code Playgroud)

在反序列化过程中,每当反序列化函数时,都将其添加到 gfunctions 中。这里,item 是(哈希、代码、闭包值、模块名称):

lfunction = types.FunctionType(marshal.loads(item[1]), globals, closure=UnmarshalClosureValues(item[2]))
gfunctions[item[0]] = lfunction
Run Code Online (Sandbox Code Playgroud)

当您遇到函数的哨兵值时,请使用代理,传入函数的哈希值:

lfunction = make_proxy(item[0])
Run Code Online (Sandbox Code Playgroud)

这是代理。它根据哈希查找真实函数:

def make_proxy(f_hash):
    def f_proxy(*args, **kwargs):
        global gfunctions
        f = lfunctions[f_hash]
        f(*args, **kwargs)

    return f_proxy
Run Code Online (Sandbox Code Playgroud)

我还必须进行一些其他更改:

  • 我在某些地方使用了 pickle 而不是 marshal,可能会进一步检查这一点
  • 我在序列化以及代码和闭包中包含了函数的模块名称,这样我就可以在反序列化时查找函数的正确全局变量。
  • 在反序列化中,元组的长度告诉您要反序列化的内容:1 表示简单值,2 表示需要代理的函数,4 表示完全序列化的函数

这是完整的新代码。

lfunctions = {}

def DeserialiseFunction(aSerialisedFunction):
    lmarshalledFunc, lmarshalledClosureValues, lmoduleName = pickle.loads(aSerialisedFunction)

    lglobals = sys.modules[lmoduleName].__dict__
    lglobals["lfunctions"] = lfunctions

    def make_proxy(f_hash):
        def f_proxy(*args, **kwargs):
            global lfunctions
            f = lfunctions[f_hash]
            f(*args, **kwargs)

        return f_proxy

    def make_cell(value):
        return (lambda x: lambda: x)(value).func_closure[0]

    def UnmarshalClosureValues(aMarshalledClosureValues):
        global lfunctions

        lclosure = None
        if aMarshalledClosureValues:
            lclosureValues = []
            for item in aMarshalledClosureValues:
                ltype = len(item)
                if ltype == 1:
                    lclosureValues.append(pickle.loads(item[0]))
                elif ltype == 2:
                    lfunction = make_proxy(item[0])
                    lclosureValues.append(lfunction)
                elif ltype == 4:
                    lfuncglobals = sys.modules[item[3]].__dict__
                    lfuncglobals["lfunctions"] = lfunctions
                    lfunction = types.FunctionType(marshal.loads(item[1]), lfuncglobals, closure=UnmarshalClosureValues(item[2]))
                    lfunctions[item[0]] = lfunction
                    lclosureValues.append(lfunction)
            lclosure = tuple([make_cell(lvalue) for lvalue in lclosureValues])
        return lclosure

    lfunctionCode = marshal.loads(lmarshalledFunc)
    lclosure = UnmarshalClosureValues(lmarshalledClosureValues)
    lfunction = types.FunctionType(lfunctionCode, lglobals, closure=lclosure)
    return lfunction

def SerialiseFunction(aFunction):
    if not aFunction or not hasattr(aFunction, "func_code"):
        raise Exception ("First argument required, must be a function")

    lfunctionHashes = set()

    def MarshalClosureValues(aClosure, aParentIndices = []):
        lmarshalledClosureValues = []
        if aClosure:
            lclosureValues = [lcell.cell_contents for lcell in aClosure]

            lmarshalledClosureValues = []
            for index, litem in enumerate(lclosureValues):
                lfullIndex = list(aParentIndices)
                lfullIndex.append(index)

                if isinstance(litem, types.FunctionType):
                    lhash = hash(litem)
                    if lhash in lfunctionHashes:
                        lmarshalledClosureValues.append([lhash, None])
                    else:
                        lfunctionHashes.add(lhash)
                        lmarshalledClosureValues.append([lhash, marshal.dumps(litem.func_code), MarshalClosureValues(litem.func_closure, lfullIndex), litem.__module__])
                else:
                    lmarshalledClosureValues.append([pickle.dumps(litem)])

    lmarshalledFunc = marshal.dumps(aFunction.func_code)
    lmarshalledClosureValues = MarshalClosureValues(aFunction.func_closure)
    lmoduleName = aFunction.__module__

    lcombined = (lmarshalledFunc, lmarshalledClosureValues, lmoduleName)

    retval = pickle.dumps(lcombined)

    return retval
Run Code Online (Sandbox Code Playgroud)