计算数组中元素的最快方法是什么？

Question

计算数组中元素的最快方法是什么？

EBH*_*EBH 7 arrays performance matlab count binning

在我的模型中,要完成的最重复的任务之一是计算数组中每个元素的数量.计数来自一个闭集,所以我知道有X各种类型的元素,并且全部或部分元素填充数组,以及代表"空"单元格的零.数组没有以任何方式排序,并且可能相当长(大约1M个元素),并且在一次模拟期间(这也是数百次模拟的一部分),该任务完成了数千次.结果应该是r大小的向量,数组中的数量也是X如此.r(k)k

例:

因为X = 9,如果我有以下输入向量:

v = [0 7 8 3 0 4 4 5 3 4 4 8 3 0 6 8 5 5 0 3]

Run Code Online (Sandbox Code Playgroud)

我想得到这个结果:

r = [0 0 4 4 3 1 1 3 0]

Run Code Online (Sandbox Code Playgroud)

请注意,我不希望计数零,并且未出现在数组中的元素(如2)0在结果向量(r(2) == 0)的相应位置具有a .

实现这一目标的最快方法是什么？

Answer 1

EBH*_*EBH 10

tl; dr:最快的方法取决于数组的大小.对于数组小于2 ¹⁴以下方法3( accumarray)更快.对于大于该方法的数组,下面的(2 histcounts)更好.

更新:我也使用2016b中引入的隐式广播对此进行了测试,结果几乎与bsxfun方法相同,此方法没有显着差异(相对于其他方法).

让我们看看执行此任务的可用方法有哪些.对于以下示例,我们假设X有n从1到1的元素,n我们感兴趣的数组M是一个可以在大小上变化的列数组.我们的结果向量将是spp¹,这spp(k)是ks 的数量M.虽然我在这里写X,但在下面的代码中没有明确的实现,我只是定义n = 500并X隐式1:500.

天真的`for`循环

处理此任务的最简单直接的方法是for循环迭代元素X并计算M与其相等的元素数量:

function spp = loop(M,n)
spp = zeros(n,1);
for k = 1:size(spp,1);
    spp(k) = sum(M==k); 
end
end

Run Code Online (Sandbox Code Playgroud)

这当然不是那么聪明,特别是如果只有很少的元素来X填充M,所以我们最好先看看那些已经存在的元素M:

function spp = uloop(M,n)
u = unique(M); % finds which elements to count
spp = zeros(n,1);
for k = u(u>0).';
    spp(k) = sum(M==k); 
end
end

Run Code Online (Sandbox Code Playgroud)

通常,在MATLAB中,建议尽可能利用内置函数,因为大多数时候它们都要快得多.我想到了5个选项:

1.功能 `tabulate`

该函数tabulate返回一个非常方便的频率表,乍一看似乎是这项任务的完美解决方案:

function tab = tabi(M)
tab = tabulate(M);
if tab(1)==0
    tab(1,:) = [];
end
end

Run Code Online (Sandbox Code Playgroud)

要做的唯一修复是删除表的第一行,如果它计算0元素(可能是没有零M).

2.功能 `histcounts`

另一种选择可以很容易地调整到我们的需要histcounts:

function spp = histci(M,n)
spp = histcounts(M,1:n+1);
end

Run Code Online (Sandbox Code Playgroud)

在这里,为了计算1到1之间的所有不同元素n,我们定义边缘1:n+1,因此每个元素X都有自己的bin.我们也可以写histcounts(M(M>0),'BinMethod','integers'),但我已经测试了它,它需要更多的时间(尽管它使函数独立n).

3.功能 `accumarray`

我将带来的下一个选项是使用该功能accumarray:

function spp = accumi(M)
spp = accumarray(M(M>0),1);
end

Run Code Online (Sandbox Code Playgroud)

这里我们将函数M(M>0)作为输入,跳过零,并1用作vals计数所有唯一元素的输入.

4.功能 `bsxfun`

我们甚至可以使用二元运算@eq(即==)来查找每种类型的所有元素:

function spp = bsxi(M,n)
spp = bsxfun(@eq,M,1:n);
spp = sum(spp,1);
end

Run Code Online (Sandbox Code Playgroud)

如果我们继续所述第一输入M和所述第二1:n不同的尺寸,从而一个是列向量的另一个是行向量,则该函数中的每个元素进行比较M,在每个元素1:n,并创建一个length(M)-by- n比我们可以归纳逻辑矩阵获得理想的结果.

5.功能 `ndgrid`

另一个类似的选项bsxfun是使用ndgrid函数显式创建所有可能性的两个矩阵:

function spp = gridi(M,n)
[Mx,nx] = ndgrid(M,1:n);
spp = sum(Mx==nx);
end

Run Code Online (Sandbox Code Playgroud)

然后我们比较它们并对列进行求和,得到最终结果.

标杆

我做了一点测试,找到了上面提到的最快的方法,我n = 500为所有的路径定义了.对于一些人(特别是天真的for),n对执行时间有很大的影响,但这不是问题,因为我们想要测试给定的n.

结果如下:

我们可以注意到几件事:

有趣的是,最快的方法有所改变.对于小于2 ^14的阵列accumarray是最快的.对于大于2 ^14的阵列histcounts,速度最快.
正如预期的那样for,两个版本中的天真循环都是最慢的,但对于小于2 ⁸的数组,"unique&for"选项较慢.ndgrid成为大于2 ^11的数组中最慢的,可能是因为需要在内存中存储非常大的矩阵.
tabulate在小于2 ^9的数组上工作的方式存在一些不规则性.在我进行的所有试验中,这个结果是一致的(在模式中有一些变化).

(bsxfun和ndgrid曲线被截断,因为它使我的计算机卡在更高的值,并且趋势已经很清楚)

另外,请注意y轴是log ₁₀,因此单位的减少(对于大小为2 ^19的数组,在accumarray和之间histcounts)意味着运算速度提高了10倍.

我很乐意在评论中听到对此测试的改进,如果您有另一种概念上不同的方法,欢迎您将其作为答案.

代码

以下是计时功能中包含的所有函数:

function out = timing_hist(N,n)
M = randi([0 n],N,1);
func_times = {'for','unique & for','tabulate','histcounts','accumarray','bsxfun','ndgrid';
    timeit(@() loop(M,n)),...
    timeit(@() uloop(M,n)),...
    timeit(@() tabi(M)),...
    timeit(@() histci(M,n)),...
    timeit(@() accumi(M)),...
    timeit(@() bsxi(M,n)),...
    timeit(@() gridi(M,n))};
out = cell2mat(func_times(2,:));
end

function spp = loop(M,n)
spp = zeros(n,1);
for k = 1:size(spp,1);
    spp(k) = sum(M==k); 
end
end

function spp = uloop(M,n)
u = unique(M);
spp = zeros(n,1);
for k = u(u>0).';
    spp(k) = sum(M==k); 
end
end

function tab = tabi(M)
tab = tabulate(M);
if tab(1)==0
    tab(1,:) = [];
end
end

function spp = histci(M,n)
spp = histcounts(M,1:n+1);
end

function spp = accumi(M)
spp = accumarray(M(M>0),1);
end

function spp = bsxi(M,n)
spp = bsxfun(@eq,M,1:n);
spp = sum(spp,1);
end

function spp = gridi(M,n)
[Mx,nx] = ndgrid(M,1:n);
spp = sum(Mx==nx);
end

Run Code Online (Sandbox Code Playgroud)

以下是运行此代码并生成图形的脚本:

N = 25; % it is not recommended to run this with N>19 for the `bsxfun` and `ndgrid` functions.
func_times = zeros(N,5);
for n = 1:N
    func_times(n,:) = timing_hist(2^n,500);
end
% plotting:
hold on
mark = 'xo*^dsp';
for k = 1:size(func_times,2)
    plot(1:size(func_times,1),log10(func_times(:,k).*1000),['-' mark(k)],...
        'MarkerEdgeColor','k','LineWidth',1.5);
end
hold off
xlabel('Log_2(Array size)','FontSize',16)
ylabel('Log_{10}(Execution time) (ms)','FontSize',16)
legend({'for','unique & for','tabulate','histcounts','accumarray','bsxfun','ndgrid'},...
    'Location','NorthWest','FontSize',14)
grid on

Run Code Online (Sandbox Code Playgroud)

¹ _{这个奇怪名字的原因来自我的领域,生态学.我的模型是细胞自动机,通常模拟虚拟空间中的个体生物(M上文).个体属于不同的物种(因此spp),并且它们一起形成所谓的"生态群落".社区的"状态"由每个物种的个体数量给出,这是spp这个答案的载体.在这个模型中,我们首先X为要抽取的个体定义一个物种库(上图),并且社区国家考虑物种库中的所有物种,而不仅仅是存在于物种库中的物种.M}

Answer 2

Dev*_*-iL 5

我们知道输入向量总是包含整数,那么为什么不使用它来"挤压"算法的更多性能呢？

我一直在尝试对OP提出的两种最佳分箱方法进行一些优化,这就是我想出的:

应将唯一值的数量(X在问题中或n在示例中)显式转换为(无符号)整数类型.
计算额外的bin然后丢弃它比"仅处理"有效值更快(参见accumi_new下面的函数).

此功能在我的机器上运行大约需要30秒.我正在使用MATLAB R2016a.

function q38941694
datestr(now)
N = 25;
func_times = zeros(N,4);
for n = 1:N
    func_times(n,:) = timing_hist(2^n,500);
end
% Plotting:
figure('Position',[572 362 758 608]);
hP = plot(1:n,log10(func_times.*1000),'-o','MarkerEdgeColor','k','LineWidth',2);
xlabel('Log_2(Array size)'); ylabel('Log_{10}(Execution time) (ms)')
legend({'histcounts (double)','histcounts (uint)','accumarray (old)',...
  'accumarray (new)'},'FontSize',12,'Location','NorthWest')
grid on; grid minor;
set(hP([2,4]),'Marker','s'); set(gca,'Fontsize',16);
datestr(now)
end

function out = timing_hist(N,n)
% Convert n into an appropriate integer class:
if n < intmax('uint8')
  classname = 'uint8';
  n = uint8(n);
elseif n < intmax('uint16')
  classname = 'uint16';
  n = uint16(n);
elseif n < intmax('uint32')
  classname = 'uint32';
  n = uint32(n);
else % n < intmax('uint64')  
  classname = 'uint64';
  n = uint64(n);
end
% Generate an input:
M = randi([0 n],N,1,classname);
% Time different options:
warning off 'MATLAB:timeit:HighOverhead'
func_times = {'histcounts (double)','histcounts (uint)','accumarray (old)',...
  'accumarray (new)';
    timeit(@() histci(double(M),double(n))),...
    timeit(@() histci(M,n)),...
    timeit(@() accumi(M)),...
    timeit(@() accumi_new(M))
    };
out = cell2mat(func_times(2,:));
end

function spp = histci(M,n)
  spp = histcounts(M,1:n+1);
end

function spp = accumi(M)
  spp = accumarray(M(M>0),1);
end

function spp = accumi_new(M)
  spp = accumarray(M+1,1);
  spp = spp(2:end);
end

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，6 月前
查看次数：	1238 次
最近记录：	7 年，2 月前

计算数组中元素的最快方法是什么？

例:

天真的for循环

1.功能 tabulate

2.功能 histcounts

3.功能 accumarray

4.功能 bsxfun

5.功能 ndgrid

标杆

代码

天真的`for`循环

1.功能 `tabulate`

2.功能 `histcounts`

3.功能 `accumarray`

4.功能 `bsxfun`

5.功能 `ndgrid`