如何将 JSON 字符串解码为带有数据帧字段的 pydantic 模型?

Tom*_*ean 8 python pydantic

我正在使用 MongoDB 将脚本的结果存储到数据库中。当我想将数据重新加载回 python 时,我需要将 JSON (或 BSON)字符串解码为 pydantic 基本模型。使用具有 JSON 兼容类型的 pydantic 模型,我可以这样做:

base_model = BaseModelClass.parse_raw(string)
Run Code Online (Sandbox Code Playgroud)

但默认json.loads解码器不知道如何处理 DataFrame。我可以将.parse_raw函数重写为:

from pydantic import BaseModel
import pandas as pd

class BaseModelClass(BaseModel):
    df: pd.DataFrame
    
    class Config:
        arbitrary_types_allowed = True
        json_encoders = {
            pd.DataFrame: lambda df: df.to_json()
        }

    @classmethod
    def parse_raw(cls, data):
        data = json.loads(data)
        data['df'] = pd.read_json(data['df'])
        return cls(**data)
Run Code Online (Sandbox Code Playgroud)

但理想情况下,我希望自动解码类型字段,pd.DataFrame而不是parse_raw每次都手动更改函数。有没有办法做类似的事情:

    class Config:
        arbitrary_types_allowed = True
        json_encoders = {
            pd.DataFrame: lambda df: df.to_json()
        }
        json_decoders = {
            pd.DataFrame: lambda df: pd.read_json(data['df'])
        }
Run Code Online (Sandbox Code Playgroud)

要检测任何应该是数据帧的字段,将其转换为数据帧,而无需修改 parse_raw() 脚本?

Yaa*_*ler 2

派丹蒂克 V2:

您可以定义自定义数据类型并指定将自动处理转换的序列化程序:

from typing import Annotated, Any

from pydantic import BaseModel, GetCoreSchemaHandler
import pandas as pd

from pydantic_core import CoreSchema, core_schema


class myDataFrame(pd.DataFrame):

    @classmethod
    def __get_pydantic_core_schema__(
            cls, source_type: Any, handler: GetCoreSchemaHandler
    ) -> CoreSchema:

        validate = core_schema.no_info_plain_validator_function(cls.try_parse_to_df)

        return core_schema.json_or_python_schema(
            json_schema=validate,
            python_schema=validate,
            serialization=core_schema.plain_serializer_function_ser_schema(
                lambda df: df.to_json()
            ),
        )

    @classmethod
    def try_parse_to_df(cls, value: Any):
        if isinstance(value, str):
            return pd.read_json(value)
        return value


# Create a model with your custom type
class BaseModelClass(BaseModel):
    df: myDataFrame


# Create your model
sample_df = pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8]], columns=["A", "B"])
my_model = BaseModelClass(df=sample_df)

# Should also be able to parse from json
my_model = BaseModelClass(df=sample_df.to_json())

# Even more dramatically
my_model_2 = BaseModelClass.model_validate_json(my_model.model_dump_json())
Run Code Online (Sandbox Code Playgroud)