在单个机器上的Octave中并行计算 - 包和示例

Question

在单个机器上的Octave中并行计算 - 包和示例

db1*_*234 11 parallel-processing octave

我想在一台机器上(而不是集群)并行化Octave中的for循环.我问了一个关于Octave的并行版本的问题,前一段时间是八度并行计算

答案建议我下载一个并行计算包,我做了.该软件包似乎主要面向集群计算,但它确实提到了单机并行计算,但还不清楚如何运行并行循环.

我还发现了另外一个关于这个问题的问题,但是我没有找到一个很好的答案来在Octave中并行化循环: 运行与Octave并行的循环部分？

有谁知道我在哪里可以找到在Octave中并行运行for循环的例子???

Answer 1

use*_*610 13

我正在计算大量的RGB直方图.我需要使用显式循环来完成它.因此,每个直方图的计算花费了显着的时间.因此,并行运行计算是有意义的.在Octave中,有一个由Jaroslav Hajek编写的(实验性)函数parcellfun可用于实现它.

我原来的循环

histograms = zeros(size(files,2), bins^3);
  % calculate histogram for each image
  for c = 1 : size(files,2)
    I = imread(fullfile(dir, files{c}));
    h = myhistRGB(I, bins);
    histograms(c, :) = h(:); % change to 1D vector
  end

Run Code Online (Sandbox Code Playgroud)

要使用parcellfun,我需要将循环体重构为一个单独的函数.

function histogram = loadhistogramp(file)
  I = imread(fullfile('.', file));
  h = myhistRGB(I, 8);
  histogram = h(:); % change to 1D vector
end

Run Code Online (Sandbox Code Playgroud)

然后我可以这样称呼它

histograms = parcellfun(8, @loadhistogramp, files);

Run Code Online (Sandbox Code Playgroud)

我在电脑上做了一个小基准测试.它是4个启用了Intel HyperThreading的物理内核.

我的原始代码

tic(); histograms2 = loadhistograms('images.txt', 8); toc();
warning: your version of GraphicsMagick limits images to 8 bits per pixel
Elapsed time is 107.515 seconds.

Run Code Online (Sandbox Code Playgroud)

与parcellfun

octave:1> pkg load general; tic(); histograms = loadhistogramsp('images.txt', 8); toc();
parcellfun: 0/178 jobs donewarning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
warning: your version of GraphicsMagick limits images to 8 bits per pixel
parcellfun: 178/178 jobs done
Elapsed time is 29.02 seconds.

Run Code Online (Sandbox Code Playgroud)

(并行和串行版本的结果是相同的(仅转置).

octave:6> sum(sum((histograms'.-histograms2).^2))
ans = 0

Run Code Online (Sandbox Code Playgroud)

当我多次重复这个时,运行时间几乎都是一样的.并行版本运行大约30秒(+ - 约2秒),包括4个,8个和16个子过程)

Answer 2

Jon*_*rsi 10

八度循环是缓慢,缓慢,缓慢的,你在表达数组操作方面的表现要好得多.让我们举一个例子来评估2d域上的简单trig函数,就像在这个3d八度音阶图形示例中一样(但是用于计算的点数更加真实,而不是绘图):

vectorized.m:

tic()
x = -2:0.01:2;
y = -2:0.01:2;
[xx,yy] = meshgrid(x,y);
z = sin(xx.^2-yy.^2);
toc()

Run Code Online (Sandbox Code Playgroud)

将它转换为for循环为我们提供了forloops.m:

tic()
x = -2:0.01:2;
y = -2:0.01:2;
z = zeros(401,401);
for i=1:401
    for j=1:401
        lx = x(i);
        ly = y(j);
        z(i,j) = sin(lx^2 - ly^2);
    endfor        
endfor
toc()

Run Code Online (Sandbox Code Playgroud)

请注意,矢量化版本已经"赢得"更简单,更清晰,但也有另一个重要优势; 时间截然不同:

$ octave --quiet vectorized.m 
Elapsed time is 0.02057 seconds.

$ octave --quiet forloops.m 
Elapsed time is 2.45772 seconds.

Run Code Online (Sandbox Code Playgroud)

因此,如果您使用for循环,并且您具有完美的并行性而没有开销,那么您必须将其分解为119个处理器,以便在非for循环中实现收支平衡!

不要误会我的意思,并行性很好,但首先要让事情有效地串联起来.

几乎所有八度音阶的内置函数都已经过矢量化,因为它们在标量或整个数组上运行得相同; 因此,将事物转换为数组操作通常很容易,而不是逐个元素地执行操作.对于那些不那么容易的时候,你通常会看到有一些实用函数(比如meshgrid,它从2个向量的笛卡尔积中生成一个二维网格)已经存在以帮助你.

归档时间：	14 年前
查看次数：	14428 次
最近记录：	11 年，9 月前