fra*_*nck 35 c# linq algorithm optimization performance
我有几个巨大的已排序的可枚举序列,我想合并.这些列表被操作为IEnumerable但已经排序.由于输入列表已排序,因此应该可以在一次行程中合并它们,而无需重新排序任何内容.
我想保留deferred执行行为.
我试着编写一个天真的算法来做到这一点(见下文).但是,它看起来很丑陋,我确信它可以进行优化.它可能存在更多的学术算法......
IEnumerable<T> MergeOrderedLists<T, TOrder>(IEnumerable<IEnumerable<T>> orderedlists,
Func<T, TOrder> orderBy)
{
var enumerators = orderedlists.ToDictionary(l => l.GetEnumerator(), l => default(T));
IEnumerator<T> tag = null;
var firstRun = true;
while (true)
{
var toRemove = new List<IEnumerator<T>>();
var toAdd = new List<KeyValuePair<IEnumerator<T>, T>>();
foreach (var pair in enumerators.Where(pair => firstRun || tag == pair.Key))
{
if (pair.Key.MoveNext())
toAdd.Add(pair);
else
toRemove.Add(pair.Key);
}
foreach (var enumerator in toRemove)
enumerators.Remove(enumerator);
foreach (var pair in toAdd)
enumerators[pair.Key] = pair.Key.Current;
if (enumerators.Count == 0)
yield break;
var min = enumerators.OrderBy(t => orderBy(t.Value)).FirstOrDefault();
tag = min.Key;
yield return min.Value;
firstRun = false;
}
}
Run Code Online (Sandbox Code Playgroud)
该方法可以这样使用:
// Person lists are already sorted by age
MergeOrderedLists(orderedList, p => p.Age);
Run Code Online (Sandbox Code Playgroud)
假设以下Person类存在于某处:
public class Person
{
public int Age { get; set; }
}
Run Code Online (Sandbox Code Playgroud)
应该保留重复项,我们不关心它们在新序列中的顺序.你看到我可以使用任何明显的优化吗?
use*_*116 13
这是我的第四个(感谢@tanascius将其推向更多的LINQ)切入它:
public static IEnumerable<T> MergePreserveOrder3<T, TOrder>(
this IEnumerable<IEnumerable<T>> aa,
Func<T, TOrder> orderFunc)
where TOrder : IComparable<TOrder>
{
var items = aa.Select(xx => xx.GetEnumerator()).Where(ee => ee.MoveNext())
.OrderBy(ee => orderFunc(ee.Current)).ToList();
while (items.Count > 0)
{
yield return items[0].Current;
var next = items[0];
items.RemoveAt(0);
if (next.MoveNext())
{
// simple sorted linear insert
var value = orderFunc(next.Current);
var ii = 0;
for ( ; ii < items.Count; ++ii)
{
if (value.CompareTo(orderFunc(items[ii].Current)) <= 0)
{
items.Insert(ii, next);
break;
}
}
if (ii == items.Count) items.Add(next);
}
else next.Dispose(); // woops! can't forget IDisposable
}
}
Run Code Online (Sandbox Code Playgroud)
结果:
for (int p = 0; p < people.Count; ++p)
{
Console.WriteLine("List {0}:", p + 1);
Console.WriteLine("\t{0}", String.Join(", ", people[p].Select(x => x.Name)));
}
Console.WriteLine("Merged:");
foreach (var person in people.MergePreserveOrder(pp => pp.Age))
{
Console.WriteLine("\t{0}", person.Name);
}
List 1:
8yo, 22yo, 47yo, 49yo
List 2:
35yo, 47yo, 60yo
List 3:
28yo, 55yo, 64yo
Merged:
8yo
22yo
28yo
35yo
47yo
47yo
49yo
55yo
60yo
64yo
Run Code Online (Sandbox Code Playgroud)
改进了.Net 4.0的Tuple支持:
public static IEnumerable<T> MergePreserveOrder4<T, TOrder>(
this IEnumerable<IEnumerable<T>> aa,
Func<T, TOrder> orderFunc) where TOrder : IComparable<TOrder>
{
var items = aa.Select(xx => xx.GetEnumerator())
.Where(ee => ee.MoveNext())
.Select(ee => Tuple.Create(orderFunc(ee.Current), ee))
.OrderBy(ee => ee.Item1).ToList();
while (items.Count > 0)
{
yield return items[0].Item2.Current;
var next = items[0];
items.RemoveAt(0);
if (next.Item2.MoveNext())
{
var value = orderFunc(next.Item2.Current);
var ii = 0;
for (; ii < items.Count; ++ii)
{
if (value.CompareTo(items[ii].Item1) <= 0)
{ // NB: using a tuple to minimize calls to orderFunc
items.Insert(ii, Tuple.Create(value, next.Item2));
break;
}
}
if (ii == items.Count) items.Add(Tuple.Create(value, next.Item2));
}
else next.Item2.Dispose(); // woops! can't forget IDisposable
}
}
Run Code Online (Sandbox Code Playgroud)
Dou*_*ean 11
我想这可能会提高清晰度和性能是这样的:
T,IEnumerable<T>根据你的比较函数有序TIEnumerable<T>要合并的项目,将项目添加到优先级队列中,IEnumerable<T>并使用对其来源的引用进行注释IEnumerable<T>其注释推进到下一个元素MoveNext()返回true,则将下一个元素添加到优先级队列中,该队列使用对IEnumerable<T>您刚才进行的引用进行注释MoveNext()返回false,则不要向优先级队列添加任何内容这是一个具有非常好的复杂性分析的解决方案,并且比所提出的其他解决方案短得多.
public static IEnumerable<T> Merge<T>(this IEnumerable<IEnumerable<T>> self)
where T : IComparable<T>
{
var es = self.Select(x => x.GetEnumerator()).Where(e => e.MoveNext());
var tmp = es.ToDictionary(e => e.Current);
var dict = new SortedDictionary<T, IEnumerator<T>>(tmp);
while (dict.Count > 0)
{
var key = dict.Keys.First();
var cur = dict[key];
dict.Remove(key);
yield return cur.Current;
if (cur.MoveNext())
dict.Add(cur.Current, cur);
}
}
Run Code Online (Sandbox Code Playgroud)
您希望合并多少个列表?如果要合并许多不同的列表,看起来您的算法效率不高.这一行是问题:
var min = enumerators.OrderBy(t => orderBy(t.Value)).FirstOrDefault();
Run Code Online (Sandbox Code Playgroud)
这将对所有列表中的每个元素运行一次,因此您的运行时将为O(n*m),其中n是所有列表中的TOTAL元素数,n是列表数.根据列表列表中列表的平均长度表示,运行时为O(a*m ^ 2).
如果您需要合并很多列表,我建议使用堆.然后,每次迭代都可以从堆中删除最小值,并将下一个元素从最小值来自的列表中添加到堆中.
这是一个没有分类的解决方案......只需要最少的比较次数.(为简单起见,我省略了实际的顺序func传递).更新以构建平衡树: -
/// <summary>
/// Merge a pair of ordered lists
/// </summary>
public static IEnumerable<T> Merge<T>(IEnumerable<T> aList, IEnumerable<T> bList)
where T:IComparable<T>
{
var a = aList.GetEnumerator();
bool aOK = a.MoveNext();
foreach (var b in bList)
{
while (aOK && a.Current.CompareTo(b) <= 0) {yield return a.Current; aOK = a.MoveNext();}
yield return b;
}
// And anything left in a
while (aOK) { yield return a.Current; aOK = a.MoveNext(); }
}
/// <summary>
/// Merge lots of sorted lists
/// </summary>
public static IEnumerable<T> Merge<T>(IEnumerable<IEnumerable<T>> listOfLists)
where T : IComparable<T>
{
int n = listOfLists.Count();
if (n < 2)
return listOfLists.FirstOrDefault();
else
return Merge (Merge(listOfLists.Take(n/2)), Merge(listOfLists.Skip(n/2)));
}
public static void Main(string[] args)
{
var sample = Enumerable.Range(1, 5).Select((i) => Enumerable.Range(i, i+5).Select(j => string.Format("Test {0:00}", j)));
Console.WriteLine("Merged:");
foreach (var result in Merge(sample))
{
Console.WriteLine("\t{0}", result);
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
6590 次 |
| 最近记录: |