Edw*_*uay 4 c# xml linq performance
以下代码比较两个XML文本,并返回它们之间的数据更改集合.
此代码工作正常,但需要尽可能的资源友好.
有没有更快的方法在LINQ中执行此操作,例如,不创建两个XElements集合并比较它们的每个字段的差异?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
namespace TestXmlDiff8822
{
class Program
{
static void Main(string[] args)
{
XDocument xdoc1 = XDocument.Parse(GetXml1());
XDocument xdoc2 = XDocument.Parse(GetXml2());
List<HistoryFieldChange> hfcList = GetHistoryFieldChanges(xdoc1, xdoc2);
foreach (var hfc in hfcList)
{
Console.WriteLine("{0}: from {1} to {2}", hfc.FieldName, hfc.ValueBefore, hfc.ValueAfter);
}
Console.ReadLine();
}
static public List<HistoryFieldChange> GetHistoryFieldChanges(XDocument xdoc1, XDocument xdoc2)
{
List<HistoryFieldChange> hfcList = new List<HistoryFieldChange>();
var elements1 = from e in xdoc1.Root.Elements()
select e;
var elements2 = from e in xdoc2.Root.Elements()
select e;
for (int i = 0; i < elements1.Count(); i++)
{
XElement element1 = elements1.ElementAt(i);
XElement element2 = elements2.ElementAt(i);
if (element1.Value != element2.Value)
{
HistoryFieldChange hfc = new HistoryFieldChange();
hfc.EntityName = xdoc1.Root.Name.ToString();
hfc.FieldName = element1.Name.ToString();
hfc.KindOfChange = "fieldDataChange";
hfc.ObjectReference = (xdoc1.Descendants("Id").FirstOrDefault()).Value;
hfc.ValueBefore = element1.Value;
hfc.ValueAfter = element2.Value;
hfcList.Add(hfc);
}
}
return hfcList;
}
public static string GetXml1()
{
return @"
<Customer>
<Id>111</Id>
<FirstName>Sue</FirstName>
<LastName>Smith</LastName>
</Customer>
";
}
public static string GetXml2()
{
return @"
<Customer>
<Id>111</Id>
<FirstName>Sue2</FirstName>
<LastName>Smith-Thompson</LastName>
</Customer>
";
}
}
public class HistoryFieldChange
{
public string EntityName { get; set; }
public string FieldName { get; set; }
public string ObjectReference { get; set; }
public string KindOfChange { get; set; }
public string ValueBefore { get; set; }
public string ValueAfter { get; set; }
}
}
Run Code Online (Sandbox Code Playgroud)
您应该能够使用元素名称上的linq连接获取具有不同值的所有元素.
var name = xdoc1.Root.Name.ToString();
var id = (xdoc1.Descendants("Id").FirstOrDefault()).Value;
var diff = from o in xdoc1.Root.Elements()
join n in xdoc2.Root.Elements() on o.Name equals n.Name
where o.Value != n.Value
select new HistoryFieldChange() {
EntityName = name,
FieldName = o.Name.ToString(),
KindOfChange = "fieldDataChange",
ObjectReference = id,
ValueBefore = o.Value,
ValueAfter = n.Value,
};
Run Code Online (Sandbox Code Playgroud)
这种方法的一个优点是很容易为多核机器并行化,只需使用PLinq和AsParallel扩展方法.
var diff = from o in xdoc1.Root.Elements()
join n in xdoc2.Root.Elements().AsParallel() on o.Name equals n.Name
where o.Value != n.Value
...
Run Code Online (Sandbox Code Playgroud)
瞧,如果查询可以在您的计算机上并行化,那么PLinq将自动处理它.这会加速大型文档,但是如果你的文档很小,你可以通过并行调用GetHistoryFieldChanges
使用Parallel.For之类的东西调用外循环来获得更好的加速.
另一个优点是你可以简单地返回IEnumerable GetHistoryFieldChanges
,不需要浪费时间分配List,这些项将在枚举时返回,并且Linq查询在此之前不会被执行.
IEnumerable<HistoryFieldChange> GetHistoryFieldChanges(...)
Run Code Online (Sandbox Code Playgroud)
以下是原始,Yannick的In-order和My非并行Linq-only实现的1M次迭代的时间.使用此代码在我的2.8ghz笔记本电脑上运行.
Elapsed Orig 3262ms
All Linq 1761ms
In Order Only 2383ms
Run Code Online (Sandbox Code Playgroud)
我注意到一件有趣的事情......在调试模式下运行代码然后发布模式,令人惊讶的是编译器可以优化纯Linq版本.我认为返回IEnumerable可以帮助编译器.