我现在正在使用visual studio中的SSE指令进行基本点积函数的小优化.
这是我的代码:(函数调用约定是cdecl):
float SSEDP4(const vect & vec1, const vect & vec2)
{
__asm
{
// get addresses
mov ecx, dword ptr[vec1]
mov edx, dword ptr[vec2]
// get the first vector
movups xmm1, xmmword ptr[ecx]
// get the second vector (must use movups, because data is not assured to be aligned to 16 bytes => TODO align data)
movups xmm1, xmmword ptr[edx]
// OP by OP multiply with second vector (by address)
mulps xmm1, xmm2
// add everything with horizontal …Run Code Online (Sandbox Code Playgroud)