这里的性能错误可能在哪里？

Question

这里的性能错误可能在哪里？

use*_*670 -5 c# linq algorithm optimization performance

许多测试用例都是超时的.我已经确定我在任何地方使用懒惰的评估,线性(或更好)的例程等等.我很震惊,这仍然不符合性能基准.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;

class Mine
{
    public int Distance { get; set; } // from river
    public int Gold { get; set; } // in tons
}

class Solution 
{
    static void Main(String[] args) 
    {
        // helper function for reading lines
        Func<string, int[]> LineToIntArray = (line) => Array.ConvertAll(line.Split(' '), Int32.Parse);

        int[] line1 = LineToIntArray(Console.ReadLine());
        int N = line1[0], // # of mines
            K = line1[1]; // # of pickup locations

        // Populate mine info
        List<Mine> mines = new List<Mine>();
        for(int i = 0; i < N; ++i)
        {
            int[] line = LineToIntArray(Console.ReadLine());
            mines.Add(new Mine() { Distance = line[0], Gold = line[1] });
        }

        // helper function for cost of a move
        Func<Mine, Mine, int> MoveCost = (mine1, mine2) => 
            Math.Abs(mine1.Distance - mine2.Distance) * mine1.Gold;

        int sum = 0; // running total of move costs

        // all move combinations on the current list of mines, 
        // given by the indicies of the mines
        var indices = Enumerable.Range(0, N); 
        var moves = from i1 in indices
                    from i2 in indices
                    where i1 != i2
                    select new int[] { i1, i2 };

        while(N != K) // while number of mines hasn't been consolidated to K
        {           
            // get move with the least cost
            var cheapest = moves.Aggregate(
                (prev, cur) => MoveCost(mines[prev[0]],mines[prev[1]]) 
                                   < MoveCost(mines[cur[0]], mines[cur[1]])
                               ? prev : cur
            );

            int i = cheapest[0], // index of source mine of cheapest move
                j = cheapest[1]; // index of destination mine of cheapest move

            // add cost to total
            sum += MoveCost(mines[i], mines[j]);

            // move gold from source to destination          
            mines[j].Gold += mines[i].Gold;

            // remove from moves any that had the i-th mine as a destination or source
            moves = from move in moves
                    where move[0] == i || move[1] == i
                    select move;

            // update size number of mines after consolidation
            --N;    
        }

        Console.WriteLine(sum);
    }
}

Run Code Online (Sandbox Code Playgroud)

Answer 1

pok*_*oke 7

懒惰的评估不会使糟糕的算法表现更好.当这些性能问题对您产生影响时,它会延迟.懒惰评估可以帮助解决的是空间复杂性,即减少执行算法所需的内存量.由于数据是懒惰生成的,因此您不必(必然)同时拥有内存中的所有数据.

然而,依靠懒惰的评估来修复你的空间(或时间)复杂性问题很容易让你陷入困境.查看以下示例代码:

var moves = Enumerable.Range(0, 5).Select(x => {
    Console.WriteLine("Generating");
    return x;
});

var aggregate = moves.Aggregate((p, c) => {
    Console.WriteLine("Aggregating");
    return p + c;
});

var newMoves = moves.Where(x => {
    Console.WriteLine("Filtering");
    return x % 2 == 0;
});

newMoves.ToList();

Run Code Online (Sandbox Code Playgroud)

正如你所看到的,无论是aggregate和newMoves依赖于懒惰的评估都是moves可枚举的.由于原始计数moves为5,我们将在输出中看到4个"聚合"行,以及5个"过滤"行.但是,您经常期望"生成"出现在控制台中？

答案是10.这是因为它moves是一个生成器并且正在被懒惰地评估.当多个位置请求它时,将为每个位置创建一个迭代器,这最终意味着生成器将多次执行(以生成独立的结果).

这不一定是个问题,但在你的情况下,很快就会变成一个问题.假设我们通过另一轮聚合继续上面的示例代码.第二个聚合将消耗newMoves,而这将消耗原始聚合moves.所以为了聚合,我们将重新运行原始moves生成器和newMoves生成器.如果我们要添加另一级别的过滤,下一轮聚合将运行三个互锁发电机,再次重新运行原始moves发电机.

由于原来的moves发电机产生二次大小的枚举,并具有实际时间复杂度的O(n²),这是一个实际的问题.在每次迭代中,您添加另一轮过滤,该过滤将与moves可枚举的大小成线性关系,并且您实际上完全消耗了可枚举的聚合.所以,你最终O(n^2 + n^3 + n^4 + n^5 + …)最终将总和n^j为j起始于2达N-K.这是一个非常糟糕的时间复杂性,而这一切只是因为你试图通过懒惰地评估这些动作来节省内存.

因此,使这更好的第一步是避免懒惰的评估.你经常迭代moves和过滤它,所以你应该把它放在内存中.是的,这意味着你有一个二次大小的数组,但实际上你不需要更多.这也限制了您的时间复杂性.是的,你仍然需要在线性时间内过滤列表(所以O(n²)因为大小是n²)并且你在循环中这样做,所以你最终会得到立方时间(O(n³)),但这已经是你的上限(迭代列表)循环中的恒定次数只会使时间复杂度增加一个常数,这无关紧要).

完成后,您应该考虑原始问题,考虑一下您实际在做什么.我相信如果您使用更好的信息,并且使用数据结构(例如散列集,或者已经存储了移动成本的图形),您可以进一步降低计算复杂度,这有助于您进行过滤和聚合.因为我不知道你原来的问题所以我不能给你一些确切的想法,但我确信你能做些什么.

最后,如果您遇到性能问题,请务必记住您的代码.分析器会告诉您代码的哪些部分是最昂贵的,因此您可以清楚地了解应该尝试优化的内容以及优化时不需要关注的内容(因为优化已经快速的部分对您没有帮助得到更快).

归档时间：	9 年，4 月前
查看次数：	99 次
最近记录：	9 年，4 月前