对于大文件来说，ProtoBuf-net 的最佳 PrefixStyle 是什么？

Question

对于大文件来说，ProtoBuf-net 的最佳 PrefixStyle 是什么？

我需要存储大数据（以千兆字节为单位）以使用 protobuf-net 2.4.0 进行流式传输。

目前，我使用的策略是使用 PrefixStyle.Base128 编写带有 LengthPrefix 的小标头，并使用以下代码使用标准 protobuf 序列化方法编写大主体，它的工作原理非常神奇。

private void Serialize(Stream stream)
{
    Model.SerializeWithLengthPrefix(stream, FileHeader, typeof(FileHeader), PrefixStyle.Base128, 1);

    if (FileHeader.SerializationMode == serializationType.Compressed)
    {                
        using (var gzip = new GZipStream(stream, CompressionMode.Compress, true))
        using (var bs = new BufferedStream(gzip, GZIP_BUFFER_SIZE))
        {
            Model.Serialize(bs, FileBody);
        }
    }
    else
        Model.Serialize(stream, FileBody);
}

Run Code Online (Sandbox Code Playgroud)

现在我需要将主体拆分为 2 个不同的对象，因此我也必须对它们使用 LengthPrefix 方法，但我不知道在这种情况下最好的PrefixStyle是什么。我可以继续使用Base128吗？Fix32描述中的“对兼容性有用”是什么意思？

更新

我发现这篇文章Marc Gravell 解释说可以选择使用开始标记和结束标记，但我不确定它是否可以与 LengthPrefix方法一起使用。更清楚地说，下面代码中显示的方法是否有效？

[ProtoContract]
public class FileHeader
{
    [ProtoMember(1)]
    public int Version { get; }
    [ProtoMember(2)]
    public string Author { get; set; }
    [ProtoMember(3)]
    public string Organization { get; set; }
}

[ProtoContract(IsGroup = true)] // can IsGroup=true help with LengthPrefix for big data?
public class FileBody1
{
    [ProtoMember(1), DataFormat = DataFormat.Group)]
    public List<Foo1> Foo1s { get; }
    [ProtoMember(2), DataFormat = DataFormat.Group)]
    public List<Foo2> Foo2s { get; }
    [ProtoMember(3), DataFormat = DataFormat.Group)]
    public List<Foo3> Foo3s { get; }
}

[ProtoContract(IsGroup = true)] // can IsGroup=true help with LengthPrefix for big data?
public class FileBody2
{
    [ProtoMember(1), DataFormat = DataFormat.Group)]
    public List<Foo4> Foo4s { get; }
    [ProtoMember(2), DataFormat = DataFormat.Group)]
    public List<Foo5> Foo5s { get; }
    [ProtoMember(3), DataFormat = DataFormat.Group)]
    public List<Foo6> Foo6s { get; }
}

public static class Helper
{
    private static void SerializeFile(Stream stream, FileHeader header, FileBody1 body1, FileBody2 body2)
    {
        var model = RuntimeTypeModel.Create();

        var serializationContext = new ProtoBuf.SerializationContext();

        model.SerializeWithLengthPrefix(stream, header, typeof(FileHeader), PrefixStyle.Base128, 1);
        model.SerializeWithLengthPrefix(stream, body1, typeof(FileBody1), PrefixStyle.Base128, 1, serializationContext);
        model.SerializeWithLengthPrefix(stream, body2, typeof(FileBody2), PrefixStyle.Base128, 1, serializationContext);
    }

    private static void DeserializeFile(Stream stream, ref FileHeader header, ref FileBody1 body1, ref FileBody2 body2)
    {
        var model = RuntimeTypeModel.Create();

        var serializationContext = new ProtoBuf.SerializationContext();

        header = model.DeserializeWithLengthPrefix(stream, null, typeof(FileHeader), PrefixStyle.Base128, 1) as FileHeader;
        body1 =  model.DeserializeWithLengthPrefix(stream, null, typeof(FileBody1), PrefixStyle.Base128, 1, null, out _, out _, serializationContext) as FileBody1;
        body2 =  model.DeserializeWithLengthPrefix(stream, null, typeof(FileBody2), PrefixStyle.Base128, 1, null, out _, out _, serializationContext) as FileBody2;
        
    }
}

Run Code Online (Sandbox Code Playgroud)

如果是这样，我想我可以继续存储大数据，而不必担心前缀长度（我的意思是指示消息长度的标记）

Answer 1

Mar*_*ell 1

Base128可能是最好的通用选择，只是因为它保持了协议兼容性（其他：没有）。不过，我建议，对于非常大的文件，在集合（以及一般的子对象）上使用“组”模式可能是非常可取的；由于不必计算大型对象图的任何长度前缀，这使得序列化速度更快。

归档时间：	3 年，2 月前
查看次数：	188 次
最近记录：	3 年，2 月前