Gar*_*ler 103 c# xml utf-8 xml-serialization
为简洁起见,删除了适当的对象处理,但如果这是在内存中将对象编码为UTF-8的最简单方法,我会感到震惊.必须有一种更简单的方法吗?
var serializer = new XmlSerializer(typeof(SomeSerializableObject));
var memoryStream = new MemoryStream();
var streamWriter = new StreamWriter(memoryStream, System.Text.Encoding.UTF8);
serializer.Serialize(streamWriter, entry);
memoryStream.Seek(0, SeekOrigin.Begin);
var streamReader = new StreamReader(memoryStream, System.Text.Encoding.UTF8);
var utf8EncodedXml = streamReader.ReadToEnd();
Run Code Online (Sandbox Code Playgroud)
Jon*_*eet 257
不,你可以用a StringWriter来摆脱中间体MemoryStream.但是,要将其强制为XML,您需要使用StringWriter覆盖该Encoding属性的内容:
public class Utf8StringWriter : StringWriter
{
public override Encoding Encoding => Encoding.UTF8;
}
Run Code Online (Sandbox Code Playgroud)
或者,如果您还没有使用C#6:
public class Utf8StringWriter : StringWriter
{
public override Encoding Encoding { get { return Encoding.UTF8; } }
}
Run Code Online (Sandbox Code Playgroud)
然后:
var serializer = new XmlSerializer(typeof(SomeSerializableObject));
string utf8;
using (StringWriter writer = new Utf8StringWriter())
{
serializer.Serialize(writer, entry);
utf8 = writer.ToString();
}
Run Code Online (Sandbox Code Playgroud)
显然你可以Utf8StringWriter进入一个更通用的类,在其构造函数中接受任何编码 - 但根据我的经验,UTF-8是迄今为止最常用的"自定义"编码StringWriter:)
现在,当乔恩·汉纳说,这将仍然是UTF-16内部,但想必你将它传递给别的东西在某些时候,将其转换成二进制数据......在这一点上,你可以使用上面的字符串,将它转换为UTF-8字节,一切都会好 - 因为XML声明将指定"utf-8"作为编码.
编辑:一个简短但完整的例子来展示这个工作:
using System;
using System.Text;
using System.IO;
using System.Xml.Serialization;
public class Test
{
public int X { get; set; }
static void Main()
{
Test t = new Test();
var serializer = new XmlSerializer(typeof(Test));
string utf8;
using (StringWriter writer = new Utf8StringWriter())
{
serializer.Serialize(writer, t);
utf8 = writer.ToString();
}
Console.WriteLine(utf8);
}
public class Utf8StringWriter : StringWriter
{
public override Encoding Encoding => Encoding.UTF8;
}
}
Run Code Online (Sandbox Code Playgroud)
结果:
<?xml version="1.0" encoding="utf-8"?>
<Test xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<X>0</X>
</Test>
Run Code Online (Sandbox Code Playgroud)
请注意"utf-8"的声明编码,我相信这是我们想要的.
Jon*_*nna 52
当你再次将它读回一个字符串时,你的代码没有把UTF-8带入内存,所以它不再是UTF-8,而是用UTF-16(尽管理想情况下它最好考虑更高级别的字符串)任何编码,除非被迫这样做).
要获得实际的UTF-8八位字节,您可以使用:
var serializer = new XmlSerializer(typeof(SomeSerializableObject));
var memoryStream = new MemoryStream();
var streamWriter = new StreamWriter(memoryStream, System.Text.Encoding.UTF8);
serializer.Serialize(streamWriter, entry);
byte[] utf8EncodedXml = memoryStream.ToArray();
Run Code Online (Sandbox Code Playgroud)
我遗漏了你留下的同样的处置方式.我略微赞成以下(正常处理):
var serializer = new XmlSerializer(typeof(SomeSerializableObject));
using(var memStm = new MemoryStream())
using(var xw = XmlWriter.Create(memStm))
{
serializer.Serialize(xw, entry);
var utf8 = memStm.ToArray();
}
Run Code Online (Sandbox Code Playgroud)
这与复杂性大致相同,但确实表明在每个阶段都有合理的选择来做其他事情,其中最紧迫的是序列化到除内存以外的某个地方,例如文件,TCP/IP流,数据库等等.总而言之,它并不是那么冗长.
Seb*_*ldi 17
使用继承非常好的答案,只需记住覆盖初始化程序
public class Utf8StringWriter : StringWriter
{
public Utf8StringWriter(StringBuilder sb) : base (sb)
{
}
public override Encoding Encoding { get { return Encoding.UTF8; } }
}
Run Code Online (Sandbox Code Playgroud)
我发现这篇博文很好地解释了这个问题,并定义了一些不同的解决方案:
(已删除死链接)
我已经确定了最好的方法是在内存中完全省略XML声明.到目前为止它实际上是 UTF-16,但是在将XML声明写入具有特定编码的文件之前,它似乎没有意义.即便如此,也不需要声明.至少,它似乎没有打破反序列化.
正如@Jon Hanna所提到的,这可以通过这样创建的XmlWriter来完成:
XmlWriter writer = XmlWriter.Create (output, new XmlWriterSettings() { OmitXmlDeclaration = true });
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
134458 次 |
| 最近记录: |