以下链接解释了UNIX(BSD风格)和Linux的x86-32系统调用约定:
但是UNIX和Linux上的x86-64系统调用约定是什么?
AMD has an ABI specification that describes the calling convention to use on x86-64. All OSes follow it, except for Windows which has it's own x86-64 calling convention. Why?
Does anyone know the technical, historical, or political reasons for this difference, or is it purely a matter of NIHsyndrome?
I understand that different OSes may have different needs for higher level things, but that doesn't explain why for example the register parameter passing order on Windows is rcx - rdx …
我想使用增强的REP MOVSB(ERMSB)为自定义获得高带宽memcpy.
ERMSB引入了Ivy Bridge微体系结构.如果您不知道ERMSB是什么,请参阅英特尔优化手册中的"增强型REP MOVSB和STOSB操作(ERMSB)" 部分.
我知道直接执行此操作的唯一方法是使用内联汇编.我从https://groups.google.com/forum/#!topic/gnu.gcc.help/-Bmlm_EG_fE获得了以下功能
static inline void *__movsb(void *d, const void *s, size_t n) {
asm volatile ("rep movsb"
: "=D" (d),
"=S" (s),
"=c" (n)
: "0" (d),
"1" (s),
"2" (n)
: "memory");
return d;
}
Run Code Online (Sandbox Code Playgroud)
然而,当我使用它时,带宽远小于memcpy.
使用我的i7-6700HQ(Skylake)系统,Ubuntu 16.10,DDR4 @ 2400 MHz双通道32 GB,GCC 6.2,__movsb获得15 GB/s并memcpy获得26 GB/s.
为什么带宽如此低REP MOVSB?我该怎么做才能改善它?
这是我用来测试它的代码.
//gcc -O3 -march=native -fopenmp foo.c
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include …Run Code Online (Sandbox Code Playgroud) 例如,累加器被命名EAX,并且在调用指令指针时IP.我也知道有些字节叫做CL和DH.我知道所有名字都必须有一个约定,但它是什么?