无法 pickle <class 'a class'>:类上的属性查找内部类失败

zha*_*hui 2 metaprogramming metaclass pickle python-3.x

我正在使用 PySpark 处理一些通话数据。正如您所看到的,我GetInfoFromCalls通过使用元类动态地向类添加了一些内部类。下面的代码位于for_test所有节点中都存在的包中:

class StatusField(object):
    """
    some alias.
    """
    failed = "failed"
    succeed = "succeed"
    status = "status"
    getNothingDefaultValue = "-999999"


class Result(object):
    """
    Result that store result and some info about it.
    """

    def __init__(self, result, status, message=None):
        self.result = result
        self.status = status
        self.message = message

structureList = [
    ("user_mobile", str, None),
    ("real_name", str, None),
    ("channel_attr", str, None),
    ("channel_src", str, None),
    ("task_data", dict, None),
    ("bill_info", list, "task_data"),
    ("account_info", list, "task_data"),
    ("payment_info", list, "task_data"),
    ("call_info", list, "task_data")
]

def inner_get(self, defaultValue=StatusField.getNothingDefaultValue):
    try:
        return self.holder.get(self)
    except Exception as e:
        return Result(defaultValue, StatusField.failed)
        print(e)

class call_meta(type):
    def __init__(cls, name, bases, attrs):

        for name_str, type_class, pLevel_str in structureList:
            setattr(cls, name_str, type(
                name_str,
                (object,),
                {})
                )

class GetInfoFromCalls(object, metaclass = call_meta):
    def __init__(self, call_deatails):
        for name_str, type_class, pLevel_str in structureList:
            inn = getattr(self.__class__, name_str)()
            object_dict = {
                "name": name_str,
                "type": type_class,
                "pLevel": None if pLevel_str is None else getattr(self, pLevel_str),
                "context": None,
                "get": inner_get,
                "holder": self,
            }
            for attr_str, real_attr in object_dict.items():
                setattr(inn, attr_str, real_attr)
            setattr(self, name_str, inn)

        self.call_details = call_deatails
Run Code Online (Sandbox Code Playgroud)

当我跑的时候

import pickle

pickle.dumps(GetInfoFromCalls("foo"))
Run Code Online (Sandbox Code Playgroud)

它引发了这样的错误:

class StatusField(object):
    """
    some alias.
    """
    failed = "failed"
    succeed = "succeed"
    status = "status"
    getNothingDefaultValue = "-999999"


class Result(object):
    """
    Result that store result and some info about it.
    """

    def __init__(self, result, status, message=None):
        self.result = result
        self.status = status
        self.message = message

structureList = [
    ("user_mobile", str, None),
    ("real_name", str, None),
    ("channel_attr", str, None),
    ("channel_src", str, None),
    ("task_data", dict, None),
    ("bill_info", list, "task_data"),
    ("account_info", list, "task_data"),
    ("payment_info", list, "task_data"),
    ("call_info", list, "task_data")
]

def inner_get(self, defaultValue=StatusField.getNothingDefaultValue):
    try:
        return self.holder.get(self)
    except Exception as e:
        return Result(defaultValue, StatusField.failed)
        print(e)

class call_meta(type):
    def __init__(cls, name, bases, attrs):

        for name_str, type_class, pLevel_str in structureList:
            setattr(cls, name_str, type(
                name_str,
                (object,),
                {})
                )

class GetInfoFromCalls(object, metaclass = call_meta):
    def __init__(self, call_deatails):
        for name_str, type_class, pLevel_str in structureList:
            inn = getattr(self.__class__, name_str)()
            object_dict = {
                "name": name_str,
                "type": type_class,
                "pLevel": None if pLevel_str is None else getattr(self, pLevel_str),
                "context": None,
                "get": inner_get,
                "holder": self,
            }
            for attr_str, real_attr in object_dict.items():
                setattr(inn, attr_str, real_attr)
            setattr(self, name_str, inn)

        self.call_details = call_deatails
Run Code Online (Sandbox Code Playgroud)

看来我无法腌制内部类,因为它们是通过代码动态添加的。当类被腌制时,内部类并不存在,对吗?
我真的不想编写这些几乎相同的类。有人有好的方法来避免这个问题吗?

jsb*_*eno 7

Python 的 pickle 实际上不会序列化类:它会序列化实例,并在序列化中放入对每个实例的类的引用 - 并且该引用基于绑定到定义良好的模块中的名称的类。因此,没有模块名称而是作为其他类中的属性或列表和字典中的数据的类实例通常不起作用。

人们可以尝试做的一件直接的事情就是尝试使用而 dill不是泡菜。它是一个第三方包,其工作方式类似于“pickle”,但具有实际序列化任意动态类的扩展。

虽然 usingdill 可能会帮助其他人到达这里,但这不是你的情况,因为为了使用 dill,你必须对 PySpark 正在使用的底层 RPC 机制进行猴子修补,以使用 dill 而不是 pickle,而这可能不是对于生产使用来说微不足道且不够一致。

如果问题确实是关于动态创建的类不可选取,那么您可以做的是为动态类本身创建额外的元类,而不是使用“类型”,并在这些元类上创建适当的__getstate____setstate__(或其他辅助方法)正如pickle 文档中所示) - 这可能使这些类能够由普通 Pickle 进行腌制。也就是说,使用带有 Pickler 辅助方法的单独元类,而不是type(..., (object, ), ...)在代码中使用。

然而,“不可拾取的对象”不是您收到的错误 - 它是一个属性查找错误,这表明您正在构建的结构不足以让 Pickle 内省它并从您的实例之一获取所有成员 - 它(还)与类对象的不可腌制性无关。由于您的动态类作为类(本身未腌制)的属性而不是实例的属性存在,因此 pickle 很可能不关心它。检查上面关于 pickle 的文档,也许您所需要的只是在您的类上进行 pickle 的正确帮助方法,而元类上没有什么不同,以便您可以正常工作。