在尝试以 CSC 格式对简单稀疏单元下三角后向求解的实现进行基准测试时,我观察到奇怪的行为。性能似乎有很大差异,具体取决于汇编指令在可执行文件中的最终位置。我在同一问题的许多不同变体中观察到这一点。一个最小的例子是获取重复的实施指令
void lowerUnitTriangularTransposedBacksolve(const EntryIndex* col_begin_indices,
const Index* row_indices,
const Value* values,
const Index dimension, Value* x) {
if (dimension == 0) return;
EntryIndex entry_index = col_begin_indices[dimension];
Index col_index = dimension;
do {
col_index -= 1;
const EntryIndex col_begin = col_begin_indices[col_index];
if (entry_index > col_begin) {
Value x_temp = x[col_index];
do {
entry_index -= 1;
x_temp -= values[entry_index] * x[row_indices[entry_index]];
} while (entry_index != col_begin);
x[col_index] = x_temp;
}
} while (col_index != 0);
}
Run Code Online (Sandbox Code Playgroud)
在两个函数中benchmarkBacksolve1和benchmarkBacksolve2 …
c++ performance benchmarking cpu-architecture branch-prediction
在该小例子,是它允许通过可选的伪参数y的test_wrapper可能不present作为用于相应的可选的伪参数实际参数y的test?
program main
implicit none
real :: x = 5.0
call test_wrapper(x)
contains
subroutine test_wrapper(x, y)
implicit none
real, intent(in) :: x
real, dimension(:), intent(out), optional :: y
call test(x, y)
end subroutine test_wrapper
subroutine test(x, y)
implicit none
real, intent(in) :: x
real, dimension(:), intent(out), optional :: y
if (present(y)) then
y = x
end if
end subroutine test
end program
Run Code Online (Sandbox Code Playgroud)
UndefinedBehaviourSanitizer 引发错误,表明它不是:https : //godbolt.org/z/nKj1h6G9r
standards fortran optional-parameters sanitizer undefined-behavior
我注意到==-operator对于浮点类型的某些行为对我来说似乎很奇怪。我知道,我不能指望像0.1 + 0.2 == 0.3要.true.因浮点表示的局限性,以及因此,浮点比较通常应该喜欢的东西做abs(x - y) < tolerance。但是,我仍然希望T在任何情况下都可以输出此最小程序:
program main
integer, parameter :: dp = kind(0d0)
real(kind=dp) :: a, b, c
a = 4.4090680619790817d+002
b = 1.0000000000000000d-004
c = (a + b)
print *, (c == (a + b))
end program
Run Code Online (Sandbox Code Playgroud)
在64位Manjaro Linux上使用gfortran 7.3.1编译该程序时,
gfortran -o a.out minimal_example.F90 && a.out
Run Code Online (Sandbox Code Playgroud)
我实际上确实得到了输出T。但是,使用以下命令编译和执行32位可执行文件时
gfortran -m32 -o a.out minimal_example.F90 && a.out
Run Code Online (Sandbox Code Playgroud)
结果是F。在我看来,存储加法结果似乎会稍微改变其值,因为两者之间的差值abs(c - (a + b))大致是 …
fortran ×2
benchmarking ×1
c++ ×1
gfortran ×1
performance ×1
precision ×1
sanitizer ×1
standards ×1