C#部分UTF-8字节流转换

Ear*_*ine 2 c# utf-8 character-encoding

我编写了以下简单测试:

[Test]
public void TestUTF8()
{
    var c = "abc?def";
    var b = Encoding.UTF8.GetBytes(c);

    Assert.That(b.Length, Is.EqualTo(9));
    //Assuming, you are reading a byte stream and got partial result with the first 5 bytes
    var p = Encoding.UTF8.GetChars(b, 0, 5);
    Trace.WriteLine(new string(p));
    Assert.That(p.Length, Is.EqualTo(3));
}
Run Code Online (Sandbox Code Playgroud)

Trace输出abc?最后断言失败,因为p.Length4

但是,我需要Trace输出abc和最后一个断言传递,因为实际上我知道流将具有有效的字符,并且当最后几个字节不是这种情况时,只需将它们留在那里等待更多数据来临。

那么如何在C#中实现呢?

Mik*_*ray 5

Encoding.GetChars并不是真正针对流中的字节而设计的,因为在解码过程中需要跟踪某些状态,因为单个字符可能跨越多个缓冲区段。要做到这一点的工作,你应该使用Decoder从获得的Encoding.GetDecoder。但是,Decoder.Convert它实际上是低级的,它允许您控制输入和输出缓冲区,并且使用起来有些困难。Decoder.GetChars使用起来更容易一些,并且在调用之间存储状态非常重要。我们可以轻松地扩展Peter Duniho 对于任意缓冲区大小的答案

public static void Main(string[] args)
{
    var c = "abc?def";
    var b = Encoding.UTF8.GetBytes(c);
    var result = DecodeFromStream(new MemoryStream(b), Encoding.UTF8, 3);
    Console.WriteLine(result);
    Console.WriteLine(c == result);
}

private static string DecodeFromStream(Stream dataStream, Encoding encoding, int bufferSize)
{
    Decoder decoder = encoding.GetDecoder();
    StringBuilder sb = new StringBuilder();
    int inputByteCount;
    byte[] inputBuffer = new byte[bufferSize];
    char[] charBuffer = new char[encoding.GetMaxCharCount(inputBuffer.Length)];

    while ((inputByteCount = dataStream.Read(inputBuffer, 0, inputBuffer.Length)) > 0)
    {                   
       int readChars = decoder.GetChars(inputBuffer, 0, inputByteCount, charBuffer, 0);
       if (readChars > 0)
           sb.Append(charBuffer, 0, readChars);
    }
    return sb.ToString();
}
Run Code Online (Sandbox Code Playgroud)