大量嵌套循环 - 如何让它更快?

2 performance matlab nested-loops

由于许多嵌套循环,我的代码的这部分(如下所示)需要很长时间才能运行.有没有办法可以避免这些嵌套循环使其运行得更快?

for k = 1:numel(UpscaledZLen.pq)
    for j = 1:numel(UpscaledRowLen.pq)
        for i = 1:numel(UpscaledColLen.pq)

            iZ = 1;
            while iZ <= UpscaledZLen.pq(k)
                for ZLag = 1:iZ

                    ihRow = 1;
                    while ihRow <= UpscaledRowLen.pq(j)
                        for hRowLag = 1:ihRow

                            ihCol = 1;
                            while ihCol <= UpscaledColLen.pq(i)
                                for hColLag = 1:ihCol
                                    temp1(hColLag) = trapz(AnalGamma(hRowLag,...
                                        1:hColLag,...
                                        ZLag));
                                end
                                temp2(ihCol) = trapz(temp1);
                                ihCol = ihCol + 1;
                            end

                            temp3(hRowLag) = trapz(temp2);
                        end
                        temp4(ihRow) = trapz(temp3);
                        ihRow = ihRow + 1;
                    end

                    temp5(ZLag) = trapz(temp4);
                end
                temp6(iZ) = trapz(temp5);
                iZ = iZ + 1;
            end

            NormVariance_AnalCorrAvg(i, j, k) = (2/((UpscaledRowLen.pq(j)*...
                UpscaledColLen.pq(i)*UpscaledZLen.pq(k))^2))*trapz(temp6);
        end
    end
end
Run Code Online (Sandbox Code Playgroud)

这段代码试图实现以下整数表达式:

在此输入图像描述

编辑(SSCCE):作为一个简短的例子,我采用了以下变量大小,然后使用上面的代码来查看运行这些特定变量大小所需的时间:

nRow = 4;
nCol = 4;
nZ = 4;

RowLenScale.pq = 1:nRow;
ColLenScale.pq = 1:nCol;
UpscaledRowLen.pq = RowLenScale.pq(rem(RowLenScale.pq(end), RowLenScale.pq) == 0);
UpscaledColLen.pq = RowLenScale.pq(rem(ColLenScale.pq(end), ColLenScale.pq) == 0);

ZLenScale.pq = 1:nZ;
UpscaledZLen.pq = ZLenScale.pq(rem(ZLenScale.pq(end), ZLenScale.pq) == 0);

AnalGamma = rand(nRow, nCol, nZ);
Run Code Online (Sandbox Code Playgroud)

对于这个例子只需要0.321976秒,而对于原始情况下nRow = 100;,nCol = 100;nZ = 20;用了24小时以上,并仍与在最外侧的循环索引运行k = 4.

A. *_*nda 7

首先,您的代码有问题,因为临时变量未初始化,并且上次迭代中剩余的值仍然存在于下一个中.纠正这种和更换后whilefor循环,代码如下所示:

NormVariance_AnalCorrAvg = nan(numel(UpscaledColLen.pq), numel(UpscaledRowLen.pq), numel(UpscaledZLen.pq));
for k = 1 : numel(UpscaledZLen.pq)
    for j = 1 : numel(UpscaledRowLen.pq)
        for i = 1 : numel(UpscaledColLen.pq)

            temp6 = nan(1, UpscaledZLen.pq(k));
            for iZ = 1 : UpscaledZLen.pq(k)
                temp5 = nan(1, iZ);
                for ZLag = 1 : iZ
                    temp4 = nan(1, UpscaledRowLen.pq(j));
                    for ihRow = 1 : UpscaledRowLen.pq(j)
                        temp3 = nan(1, ihRow);
                        for hRowLag = 1 : ihRow
                            temp2 = nan(1, UpscaledColLen.pq(i));
                            for ihCol = 1 : UpscaledColLen.pq(i)
                                temp1 = nan(1, ihCol);
                                for hColLag = 1:ihCol
                                    temp1(hColLag) = trapz(AnalGamma(hRowLag,...
                                        1:hColLag,...
                                        ZLag));
                                end
                                temp2(ihCol) = trapz(temp1);
                            end
                            temp3(hRowLag) = trapz(temp2);
                        end
                        temp4(ihRow) = trapz(temp3);
                    end
                    temp5(ZLag) = trapz(temp4);
                end
                temp6(iZ) = trapz(temp5);
            end
            NormVariance_AnalCorrAvg(i, j, k) = (2/((UpscaledRowLen.pq(j)*...
                UpscaledColLen.pq(i)*UpscaledZLen.pq(k))^2))*trapz(temp6);

        end
    end
end
Run Code Online (Sandbox Code Playgroud)

这里的初始化也会导致数组的预分配,这是加速循环的基本建议事项之一.但是,在这种情况下,它没有效果.

为了对代码进行基准测试,我使用了你的代码片段rand,但是使用了

nRow = 20;
nCol = 20;
nZ = 2;
Run Code Online (Sandbox Code Playgroud)

在我的计算机上,清理后的代码需要25.5秒才能完成.


如何让它更快?好老矢量化:

最里面的循环hColLag可以用a替换cumtrapz,将运行时间减少到8.2秒.

temp1 = cumtrapz(AnalGamma(hRowLag, 1 : ihCol, ZLag), 2);
Run Code Online (Sandbox Code Playgroud)

循环结束也是如此ihCol,将运行时间减少到1.85秒.

temp2 = cumtrapz(cumtrapz(AnalGamma(hRowLag, 1 : UpscaledColLen.pq(i), ZLag), 2), 2);
Run Code Online (Sandbox Code Playgroud)

下一个外环hRowLag仅用于计算几个类似的trapzs.这是没有必要的,因为它trapz是完全矢量化的; 循环可以被一次调用替换,这将运行时间减少到0.34秒.

temp3 = trapz(cumtrapz(cumtrapz(AnalGamma(1 : ihRow, 1 : UpscaledColLen.pq(i), ZLag), 2), 2), 2);
Run Code Online (Sandbox Code Playgroud)

循环结束ihRow计算累积积分; 使用cumtrapz将运行时间降低到大约0.07秒.

temp4 = cumtrapz(trapz(cumtrapz(cumtrapz(AnalGamma(1 : UpscaledRowLen.pq(j), 1 : UpscaledColLen.pq(i), ZLag), 2), 2), 2), 1);
Run Code Online (Sandbox Code Playgroud)

在应用相同的逻辑来环路iZZLag,我们到达下面的代码

NormVariance_AnalCorrAvg = nan(numel(UpscaledColLen.pq), numel(UpscaledRowLen.pq), numel(UpscaledZLen.pq));
for k = 1 : numel(UpscaledZLen.pq)
    for j = 1 : numel(UpscaledRowLen.pq)
        for i = 1 : numel(UpscaledColLen.pq)
            temp6 = cumtrapz(trapz(cumtrapz(trapz(cumtrapz(cumtrapz(...
                AnalGamma(1 : UpscaledRowLen.pq(j), 1 : UpscaledColLen.pq(i), 1 : UpscaledZLen.pq(k)), ...
                2), 2), 2), 1), 1), 3);
            NormVariance_AnalCorrAvg(i, j, k) = (2/((UpscaledRowLen.pq(j)*...
                UpscaledColLen.pq(i)*UpscaledZLen.pq(k))^2))*trapz(temp6);

        end
    end
end
Run Code Online (Sandbox Code Playgroud)

运行时间为0.043秒,改善了约600倍.

编写所有这些嵌套的trapzs和cumtrapzs当然非常容易出错.我使用了一个校验和应用于保存到文件的结果,以确保我的代码更改不会更改它.不过,你应该仔细检查才能完全确定.


可能会更进一步.前三个for循环再次做一些累积的事情,所以也许更多的cumtrapz是有序的.trapz没有非常有效地实现,因为它有很多错误检查代码.由于它的作用基本上是平均值的总和(参见梯形规则),因此在整个网格上预先计算多维平均值将允许将这些调用转换为调用sum.比梯形规则更简单的是矩形方法,您可以为其计算函数值,而不是在边缘处,而是在网格单元格的中心.当然,您可以尝试分析地部分计算积分; 并且您可以降低网格的分辨率.

希望这可以帮助!