Pau*_*l R 17
In the majority of cases prefetch instructions are of little or no benefit, and can even be counter-productive in some cases. Most modern CPUs have an automatic prefetch mechanism which works well enough that adding software prefetch hints achieves little, or even interferes with automatic prefetch, and can actually reduce performance.
In some rare cases, such as when you are streaming large blocks of data on which you are doing very little actual processing, you may manage to hide some latency with software-initiated prefetching, but it's very hard to get it right - you need to start the prefetch several hundred cycles before you are going to be using the data - do it too late and you still get a cache miss, do it too early and your data may get evicted from cache before you are ready to use it. Often this will put the prefetch in some unrelated part of the code, which is bad for modularity and software maintenance. Worse still, if your architecture changes (new CPU, different clock speed, etc), such that DRAM access latency increases or decreases, you may need to move your prefetch instructions to another part of the code to keep them effective.
Anyway, if you feel you really must use prefetch, I recommend #ifdefs around any prefetch instructions so that you can compile your code with and without prefetch and see if it is actually helping (or hindering) performance, e.g.
#ifdef USE_PREFETCH
// prefetch instruction(s)
#endif
Run Code Online (Sandbox Code Playgroud)
In general though, I would recommend leaving software prefetch on the back burner as a last resort micro-optimisation after you've done all the more productive and obvious stuff.
甚至考虑预取代码性能肯定已经是一个问题。
1:使用代码分析器。尝试在没有分析器的情况下使用预取是浪费时间。
2:每当您在关键位置发现异常缓慢的指令时,您就有了预取的候选者。通常,实际问题出在慢行之前的内存访问上,而不是分析器指示的慢行上。找出导致问题的内存访问(并不总是那么容易)并预取它。
3 再次运行您的分析器,看看它是否有任何不同。如果没有拿出来。有时,我以这种方式将循环速度提高了 300% 以上。如果您有一个以非顺序方式访问内存的循环,它通常是最有效的。
我完全不同意它在现代 CPU 上的用处不大,我发现完全相反,尽管在较旧的 CPU 上预取大约 100 条指令是最佳的,但现在我把这个数字更像是 500。