spe*_*der 7 c# string xelement
我们收集了大量字符串并将它们以xml片段发送给我们的客户端.这些字符串可以包含任何字符.我们一直在看到因尝试序列化包含"坏"字符的XElement实例而导致的错误.这是一个例子:
var message = new XElement("song");
char c = (char)0x1a; //sub
var someData = string.Format("some{0}stuff", c);
var attr = new XAttribute("someAttr", someData);
message.Add(attr);
string msgStr = message.ToString(SaveOptions.DisableFormatting); //exception here
Run Code Online (Sandbox Code Playgroud)
上面的代码在指定的行生成异常.这是堆栈跟踪:
'SUB', hexadecimal value 0x1A, is an invalid character. System.ArgumentException System.ArgumentException: '', hexadecimal value 0x1A, is an invalid character. at System.Xml.XmlEncodedRawTextWriter.InvalidXmlChar(Int32 ch, Char* pDst, Boolean entitize) at System.Xml.XmlEncodedRawTextWriter.WriteAttributeTextBlock(Char* pSrc, Char* pSrcEnd) at System.Xml.XmlEncodedRawTextWriter.WriteString(String text) at System.Xml.XmlWellFormedWriter.WriteString(String text) at System.Xml.XmlWriter.WriteAttributeString(String prefix, String localName, String ns, String value) at System.Xml.Linq.ElementWriter.WriteStartElement(XElement e) at System.Xml.Linq.ElementWriter.WriteElement(XElement e) at System.Xml.Linq.XElement.WriteTo(XmlWriter writer) at System.Xml.Linq.XNode.GetXmlString(SaveOptions o)
我怀疑这不是正确的行为,坏的char应该转义为XML.这是否可取是我稍后会回答的问题.
所以这就是问题:
有没有办法处理字符串,以便可能不会发生此错误,或者我应该简单地删除所有字符在char下面0x20并交叉我的手指?
使用ILSpy进行一些挖掘后发现,可以使用XmlWriter/ReaderSettings.CheckCharacters字段来控制是否为无效字符抛出异常.借用XNode.ToString方法和XDocument.Parse方法,我提出了以下示例:
要使用无效(控制)字符对XLinq对象进行字符串化:
XDocument xdoc = XDocument.Parse("<root>foo</root>");
using (StringWriter stringWriter = new StringWriter())
{
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings { OmitXmlDeclaration = true, CheckCharacters = false };
using (XmlWriter xmlWriter = XmlWriter.Create(stringWriter, xmlWriterSettings))
{
xdoc.WriteTo(xmlWriter);
}
return stringWriter.ToString();
}
Run Code Online (Sandbox Code Playgroud)
要使用无效字符解析XLinq对象:
XDocument xdoc;
using (StringReader stringReader = new StringReader(text))
{
XmlReaderSettings xmlReaderSettings = new XmlReaderSettings { CheckCharacters = false, DtdProcessing = DtdProcessing.Parse, MaxCharactersFromEntities = 10000000L, XmlResolver = null };
using (XmlReader xmlReader = XmlReader.Create(stringReader, xmlReaderSettings))
{
xdoc = XDocument.Load(xmlReader);
}
}
Run Code Online (Sandbox Code Playgroud)
这是我在我的代码中使用的:
static Lazy<Regex> ControlChars = new Lazy<Regex>(() => new Regex("[\x00-\x1f]", RegexOptions.Compiled));
private static string FixData_Replace(Match match)
{
if ((match.Value.Equals("\t")) || (match.Value.Equals("\n")) || (match.Value.Equals("\r")))
return match.Value;
return "&#" + ((int)match.Value[0]).ToString("X4") + ";";
}
public static string Fix(object data, MatchEvaluator replacer = null)
{
if (data == null) return null;
string fixed_data;
if (replacer != null) fixed_data = ControlChars.Value.Replace(data.ToString(), replacer);
else fixed_data = ControlChars.Value.Replace(data.ToString(), FixData_Replace);
return fixed_data;
}
Run Code Online (Sandbox Code Playgroud)
低于0x20的所有字符(除了\ r \n\t)都被其XML unicode代码替换:0x1f =>"f".在读取文件时,Xml解析器应自动将其转换回0x1f.只需使用新的XAttribute("属性",修复(yourString))
它适用于XElement内容,它可能也适用于XAttributes.
| 归档时间: |
|
| 查看次数: |
2398 次 |
| 最近记录: |