fmad = false表现良好

Question

fmad = false表现良好

来自Nvidia发行说明:

 The nvcc compiler switch, --fmad (short name: -fmad), to control the contraction of    
 floating-point multiplies and add/subtracts into floating-point multiply-add   
 operations (FMAD, FFMA, or DFMA) has been added: 
 --fmad=true and --fmad=false enables and disables the contraction respectively. 
 This switch is supported only when the --gpu-architecture option is set with     
 compute_20, sm_20, or higher. For other architecture classes, the contraction is     
  always enabled. 
 The --use_fast_math option implies --fmad=true, and enables the contraction.

Run Code Online (Sandbox Code Playgroud)

我有两个内核 - 一个是纯粹的计算绑定,有很多乘法,而另一个是内存绑定.当我这样做时,我注意到我的计算密集型内核的性能持续改善(大约5%),-fmad=false并且当我为内存绑定内核关闭时,性能下降相同.所以,FMA对我的内存绑定内核工作得更好,但我的计算绑定内核可以通过关闭它来挤出一点性能.可能是什么原因？我的设备是M2090,我使用的是CUDA 4.2.

完整的编译选项:( -arch,sm_20,-ftz=true,-prec-div=false,-prec-sqrt=false,-use_fast_math,-fmad=false或者我只是删除,fmad=false因为这是默认的.

Answer 1

nju*_*ffa 7

使用FMA可能会略微增加寄存器压力,因为必须同时提供三个源操作数.因此,开启/关闭FMA生成可能会导致指令调度和寄存器分配方面的细微差别,从而导致性能差异小.对于具有许多乘法加法惯用语的计算绑定内核,-fmad = true应该会产生显着的性能差异,但正如您所说,您的内核由乘法控制,因此使用FMA很少受益,并且任何增益都可能是由寄存器压力/指令调度方面抵消

归档时间：	13 年，5 月前
查看次数：	2045 次
最近记录：	13 年，5 月前