[优化此项]:慢速LINQ到对象查询

Question

[优化此项]:慢速LINQ到对象查询

Ben*_*sen 8 c# linq optimization performance linq-to-objects

我有这个问题困扰着我; 它被封装为一个新的查询运算符,我制作了两个版本,试图看看哪个更好.两者表现得非常糟糕.

第一次尝试; 陈述式

public static IEnumerable<IEnumerable<?>> Section<?>(this IEnumerable<?> source, int length)
{
    return source.Any()
        ? source.Take(length).Cons(source.Skip(length).Section(length))
        : Enumerable.Empty<IEnumerable<?>>();
}

Run Code Online (Sandbox Code Playgroud)

第二次尝试:势在必行的"收益率回报"风格

public static IEnumerable<IEnumerable<?>> Section<?>(this IEnumerable<?> source, int length)
{
    var fst = source.Take(length);
    var rst = source.Skip(length);

    yield return fst;

    if (rst.Any())
        foreach (var section in rst.Section(length))
            yield return section;
}

Run Code Online (Sandbox Code Playgroud)

事实上,第二次尝试在可读性,组合性和速度方面都更糟糕.

关于如何优化这个的任何线索？

Answer 1

ang*_*son 10

如果我正确地理解了你的问题,你就会尝试构建一个枚举器的惰性实现,它将更大的项集合分成更小的可枚举项集合.

例如,一百万个数字的序列可以被分成"部分",每个部分只产生100个,你想要它们都懒惰地完成,即.在制作之前不会将100个项目收集到列表中.

首先,您的尝试将多次重复迭代集合,这很糟糕,因此性能问题.

如果您正在尝试构建纯粹的延迟实现,则应考虑以下问题:

您只想迭代底层集合一次
您应该返回重用底层枚举器的枚举数
您需要处理您返回的部分未完全枚举(例如,调用代码只需要这100个项目中的前50个).

编辑:在我进入简单的解决方案之前,这里有一些注意事项:

您无法保存每个部分以供日后使用,即.你做不到:collection.Sequence(10).ToArray()获得一系列的部分.
您不能多次枚举每个部分,因为当您这样做时,它会更改隐藏的基础数据结构.

基本上:我的解决方案不是通用的.如果你需要,你应该使用@LBushkin关于MoreLinq Batch 的评论,我会毫不犹豫地把我的代码放到一个类库中,它必须是本地需要它,或者重命名为明确警告你的东西它的问题.

这是一个简单的实现,我很确定这里有bug,所以你可能想看看为edgecase实现大量的单元测试:

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApplication20
{
    class SectionEnumerable<T> : IEnumerable<T>
    {
        private readonly IEnumerator<T> _Enumerator;

        public SectionEnumerable(IEnumerator<T> enumerator, int sectionSize)
        {
            _Enumerator = enumerator;
            Left = sectionSize;
        }

        public IEnumerator<T> GetEnumerator()
        {
            while (Left > 0)
            {
                Left--;
                yield return _Enumerator.Current;
                if (Left > 0)
                    if (!_Enumerator.MoveNext())
                        break;
            }
        }

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }

        public int Left { get; private set; }
    }

    static class SequenceExtensions
    {
        public static IEnumerable<IEnumerable<T>> Section<T>(this IEnumerable<T> collection, int sectionSize)
        {
            if (collection == null)
                throw new ArgumentNullException("collection");
            if (sectionSize < 1)
                throw new ArgumentOutOfRangeException("sectionSize");

            using (IEnumerator<T> enumerator = collection.GetEnumerator())
            {
                while (enumerator.MoveNext())
                {
                    SectionEnumerable<T> enumerable = new SectionEnumerable<T>(enumerator, sectionSize);
                    yield return enumerable;
                    for (int index = 0; index < enumerable.Left; index++)
                        if (!enumerator.MoveNext())
                            yield break;
                }
            }
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var sequence = Enumerable.Range(0, 100);
            var sections = sequence.Section(10);
            foreach (var section in sections)
            {
                Console.WriteLine(
                    String.Join(", ",
                    section.Take(5).ToArray().Select(i => i.ToString()).ToArray()));
            }
            Console.ReadLine();
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

输出:

0, 1, 2, 3, 4
10, 11, 12, 13, 14
20, 21, 22, 23, 24
30, 31, 32, 33, 34
40, 41, 42, 43, 44
50, 51, 52, 53, 54
60, 61, 62, 63, 64
70, 71, 72, 73, 74
80, 81, 82, 83, 84
90, 91, 92, 93, 94

Run Code Online (Sandbox Code Playgroud)

你应该进行单元测试的事情:

空输入集合不产生任何部分
具有恰当数量元素的集合仅产生一个部分
包含多个section-size元素的集合(即10,20,30等数量的元素,截面大小为5或10),在所有预期的元素之后不会产生空白部分
它实际上是懒惰的,如果你枚举第一个10元素部分,但只是第二个部分的前5个,只列出底层集合的前15个元素

这已经深受打击.当你走出测试用例并做任何不同的事情时,你会得到疯狂的结果.例如:"var sections = sequence.Section(10).ToArray(); foreach(var [in section [5])Console.WriteLine(num);" - 你期望获得{50 ... 59}.取而代之的是{99}. (6认同)
这看起来很像MoreLinq中的`Batch()`运算符; 请参阅:http://code.google.com/p/morelinq/source/browse/trunk/MoreLinq/Batch.cs (2认同)

Answer 2

Eri*_*ert 9

我怀疑你遇到的问题与枚举最终结果至少是O(n ^ 2)操作的事实有关,可能更糟; 我还没有全力以赴.

这是为什么？好吧,假设你有[1,2,3,4,5,6],你把它分成你认为的{{1,2},{3,4},{5,6}}

那不是你做过的.事实上,你把它分成{取前两个,取前两个然后丢弃它们然后取下两个,取前两个然后丢弃然后再取下两个并丢弃它们然后取第三个两个}

请注意沿途的每一步如何重新计算结果？那是因为数组可能在对枚举的调用之间发生变化. LINQ旨在为您提供最新的结果; 你写了一个查询,意思是"跳过前四个并迭代下两个",这正是你得到的 - 一个在你枚举时执行该代码的查询.

原始序列是否足够小且足够快,您可以将整个内容读入内存并立即将其全部拆分,而不是试图懒散地这样做？或者,序列是否可索引？如果你得到的只是对序列的前向访问,并且它太大或太慢都无法一次读入内存,那么你可以在这里做很多事情.但是如果你有这两个属性中的一个或两个,那么你可以使它至少是线性的.

归档时间：	15 年，8 月前
查看次数：	3279 次
最近记录：	9 年，8 月前