pan*_*ant 9 c++ simd vectorization openmp
我正在尝试使用SIMD启用函数并使用函数调用向量化循环.
#include <cmath>
#pragma omp declare simd
double BlackBoxFunction(const double x) {
return 1.0/sqrt(x);
}
double ComputeIntegral(const int n, const double a, const double b) {
const double dx = (b - a)/n;
double I = 0.0;
#pragma omp simd reduction(+: I)
for (int i = 0; i < n; i++) {
const double xip12 = a + dx*(double(i) + 0.5);
const double yip12 = BlackBoxFunction(xip12);
const double dI = yip12*dx;
I += dI;
}
return I;
}
Run Code Online (Sandbox Code Playgroud)
对于上面的代码,如果我用icpc以下代码编译它:
icpc worker.cc -qopenmp -qopt-report=5 -c
Run Code Online (Sandbox Code Playgroud)
opt报告显示函数和循环都是矢量化的.但是,如果我尝试编译它g++ 6.5:
g++ worker.cc -O3 -fopenmp -fopt-info-vec-missed -funsafe-math-optimizations -c
Run Code Online (Sandbox Code Playgroud)
输出显示note:not vectorized: control flow in loop.和note: bad loop form,并且循环不能被矢量化.
如何使用GCC对循环进行矢量化?
编辑:
如果我将函数写入单独的文件,
worker.cc:
#include "library.h"
double ComputeIntegral(const int n, const double a, const double b) {
const double dx = (b - a)/n;
double I = 0.0;
#pragma omp simd reduction(+: I)
for (int i = 0; i < n; i++) {
const double xip12 = a + dx*(double(i) + 0.5);
const double yip12 = BlackBoxFunction(xip12);
const double dI = yip12*dx;
I += dI;
}
return I;
}
Run Code Online (Sandbox Code Playgroud)
library.h:
#ifndef __INCLUDED_LIBRARY_H__
#define __INCLUDED_LIBRARY_H__
#pragma omp declare simd
double BlackBoxFunction(const double x);
#endif
Run Code Online (Sandbox Code Playgroud)
并且library.cc:
#include <cmath>
#pragma omp declare simd
double BlackBoxFunction(const double x) {
return 1.0/sqrt(x);
}
Run Code Online (Sandbox Code Playgroud)
然后我用GCC编译它:
g++ worker.cc library.cc -O3 -fopenmp -fopt-info-vec-missed -funsafe-math-optimizations -c
Run Code Online (Sandbox Code Playgroud)
表明:
worker.cc:9:31: note: loop vectorized
Run Code Online (Sandbox Code Playgroud)
但
library.cc:5:18: note:not vectorized: control flow in loop.
library.cc:5:18: note:bad loop form.
Run Code Online (Sandbox Code Playgroud)
这让我感到困惑.我想知道它是否已经被矢量化了.
在对代码稍作修改后,可以使用gcc进行矢量化:
#include <cmath>
double BlackBoxFunction(const double x) {
return 1.0/sqrt(x);
}
double ComputeIntegral(const int n, const double a, const double b) {
const double dx = (b - a)/n;
double I = 0.0;
double d_i = 0.0;
for (int i = 0; i < n; i++) {
const double xip12 = a + dx*(d_i + 0.5);
d_i = d_i + 1.0;
const double yip12 = BlackBoxFunction(xip12);
const double dI = yip12*dx;
I += dI;
}
return I;
}
Run Code Online (Sandbox Code Playgroud)
这是使用编译器选项编译的:-Ofast -march=haswell -fopt-info-vec-missed -funsafe-math-optimizations.主循环编译为
.L7:
vaddpd ymm2, ymm4, ymm7
inc eax
vaddpd ymm4, ymm4, ymm8
vfmadd132pd ymm2, ymm9, ymm5
vsqrtpd ymm2, ymm2
vdivpd ymm2, ymm6, ymm2
vfmadd231pd ymm3, ymm5, ymm2
cmp eax, edx
jne .L7
Run Code Online (Sandbox Code Playgroud)
请参阅以下Godbolt链接
我删除了#pragma omp ...,因为它们没有改善矢量化,但它们也没有使矢量化更糟.
请注意,只有从改变编译器选项-O3到-Ofast足以使量化.然而,使用double计数器比int计数器更有效,计数器每次迭代转换为两倍.
另请注意,矢量化报告非常具有误导性.检查生成的汇编代码以查看矢量化是否成功.