Ear*_*ine 2 c# utf-8 character-encoding
我编写了以下简单测试:
[Test]
public void TestUTF8()
{
var c = "abc?def";
var b = Encoding.UTF8.GetBytes(c);
Assert.That(b.Length, Is.EqualTo(9));
//Assuming, you are reading a byte stream and got partial result with the first 5 bytes
var p = Encoding.UTF8.GetChars(b, 0, 5);
Trace.WriteLine(new string(p));
Assert.That(p.Length, Is.EqualTo(3));
}
Run Code Online (Sandbox Code Playgroud)
在Trace输出abc?最后断言失败,因为p.Length是4。
但是,我需要Trace输出abc和最后一个断言传递,因为实际上我知道流将具有有效的字符,并且当最后几个字节不是这种情况时,只需将它们留在那里等待更多数据来临。
那么如何在C#中实现呢?
Encoding.GetChars并不是真正针对流中的字节而设计的,因为在解码过程中需要跟踪某些状态,因为单个字符可能跨越多个缓冲区段。要做到这一点的工作,你应该使用Decoder从获得的Encoding.GetDecoder。但是,Decoder.Convert它实际上是低级的,它允许您控制输入和输出缓冲区,并且使用起来有些困难。Decoder.GetChars使用起来更容易一些,并且在调用之间存储状态非常重要。我们可以轻松地扩展Peter Duniho 对于任意缓冲区大小的答案:
public static void Main(string[] args)
{
var c = "abc?def";
var b = Encoding.UTF8.GetBytes(c);
var result = DecodeFromStream(new MemoryStream(b), Encoding.UTF8, 3);
Console.WriteLine(result);
Console.WriteLine(c == result);
}
private static string DecodeFromStream(Stream dataStream, Encoding encoding, int bufferSize)
{
Decoder decoder = encoding.GetDecoder();
StringBuilder sb = new StringBuilder();
int inputByteCount;
byte[] inputBuffer = new byte[bufferSize];
char[] charBuffer = new char[encoding.GetMaxCharCount(inputBuffer.Length)];
while ((inputByteCount = dataStream.Read(inputBuffer, 0, inputBuffer.Length)) > 0)
{
int readChars = decoder.GetChars(inputBuffer, 0, inputByteCount, charBuffer, 0);
if (readChars > 0)
sb.Append(charBuffer, 0, readChars);
}
return sb.ToString();
}
Run Code Online (Sandbox Code Playgroud)