serde：加速自定义枚举反序列化

Question

serde：加速自定义枚举反序列化

我的程序解析了足够大的 json 文档（30MB），在 CPU 速度较慢的机器上需要 70 毫秒，我想加快这个过程，我发现 27% 的解析发生在我的foo_document_type_deserialize.有没有办法在String这里跳过分配：let s = String::deserialize(deserializer)?;？

我完全确定表示枚举值的字符串不包含特殊的 json 字符，例如\b \f \n \r \t \" \\，因此使用未转义的字符串应该是安全的。

use serde::{Deserialize, Deserializer};

#[derive(Deserialize, Debug, Clone)]
#[serde(rename_all = "camelCase")]
pub struct FooDocument {
    // other fields...
    #[serde(rename = "type")]
    #[serde(deserialize_with = "foo_document_type_deserialize")]
    doc_type: FooDocumentType,
}

fn foo_document_type_deserialize<'de, D>(deserializer: D) -> Result<FooDocumentType, D::Error>
where
    D: Deserializer<'de>,
{
    use self::FooDocumentType::*;
    let s = String::deserialize(deserializer)?;
    match s.as_str() {
        "tir lim bom bom" => Ok(Var1),
        "hgga;hghau" => Ok(Var2),
        "hgueoqtyhit4t" => Ok(Var3),
        "Text" | "Type not detected" | "---" => Ok(Unknown),
        _ => Err(serde::de::Error::custom(format!(
            "Unsupported foo document type '{}'",
            s
        ))),
    }
}

#[derive(Debug, Clone, Copy)]
pub enum FooDocumentType {
    Unknown,
    Var1,
    Var2,
    Var3,
}

Run Code Online (Sandbox Code Playgroud)

Answer 1

dto*_*nay 6

您编写的自定义 impl 采用 serde_derive 可以生成的形式：

#[derive(Deserialize, Debug)]
pub enum FooDocumentType {
    #[serde(rename = "Text", alias = "Type not detected", alias = "---")]
    Unknown,
    #[serde(rename = "tir lim bom bom")]
    Var1,
    #[serde(rename = "hgga;hghau")]
    Var2,
    #[serde(rename = "hgueoqtyhit4t")]
    Var3,
}

Run Code Online (Sandbox Code Playgroud)

当我测量以下内容时，生成的派生代码不分配内存，并且在快速微基准测试中比您的代码快约 2 倍：

serde_json::from_str::<FooDocument>(r#"{"type":"hgga;hghau"}"#).unwrap()

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，8 月前
查看次数：	582 次
最近记录：	6 年，8 月前