我正在使用 MongoDB 将脚本的结果存储到数据库中。当我想将数据重新加载回 python 时,我需要将 JSON (或 BSON)字符串解码为 pydantic 基本模型。使用具有 JSON 兼容类型的 pydantic 模型,我可以这样做:
base_model = BaseModelClass.parse_raw(string)
Run Code Online (Sandbox Code Playgroud)
但默认json.loads解码器不知道如何处理 DataFrame。我可以将.parse_raw函数重写为:
from pydantic import BaseModel
import pandas as pd
class BaseModelClass(BaseModel):
df: pd.DataFrame
class Config:
arbitrary_types_allowed = True
json_encoders = {
pd.DataFrame: lambda df: df.to_json()
}
@classmethod
def parse_raw(cls, data):
data = json.loads(data)
data['df'] = pd.read_json(data['df'])
return cls(**data)
Run Code Online (Sandbox Code Playgroud)
但理想情况下,我希望自动解码类型字段,pd.DataFrame而不是parse_raw每次都手动更改函数。有没有办法做类似的事情:
class Config:
arbitrary_types_allowed = True
json_encoders = {
pd.DataFrame: lambda df: df.to_json()
}
json_decoders = {
pd.DataFrame: lambda df: pd.read_json(data['df'])
}
Run Code Online (Sandbox Code Playgroud)
要检测任何应该是数据帧的字段,将其转换为数据帧,而无需修改 parse_raw() 脚本?
您可以定义自定义数据类型并指定将自动处理转换的序列化程序:
from typing import Annotated, Any
from pydantic import BaseModel, GetCoreSchemaHandler
import pandas as pd
from pydantic_core import CoreSchema, core_schema
class myDataFrame(pd.DataFrame):
@classmethod
def __get_pydantic_core_schema__(
cls, source_type: Any, handler: GetCoreSchemaHandler
) -> CoreSchema:
validate = core_schema.no_info_plain_validator_function(cls.try_parse_to_df)
return core_schema.json_or_python_schema(
json_schema=validate,
python_schema=validate,
serialization=core_schema.plain_serializer_function_ser_schema(
lambda df: df.to_json()
),
)
@classmethod
def try_parse_to_df(cls, value: Any):
if isinstance(value, str):
return pd.read_json(value)
return value
# Create a model with your custom type
class BaseModelClass(BaseModel):
df: myDataFrame
# Create your model
sample_df = pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8]], columns=["A", "B"])
my_model = BaseModelClass(df=sample_df)
# Should also be able to parse from json
my_model = BaseModelClass(df=sample_df.to_json())
# Even more dramatically
my_model_2 = BaseModelClass.model_validate_json(my_model.model_dump_json())
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
5254 次 |
| 最近记录: |