将寄存器设置为零的方法有多少?

28 x86 assembly tasm x86-16

我很好奇有多少种方法可以在x86汇编中将寄存器设置为零.使用一条指令.有人告诉我,他设法找到了至少10种方法.

我能想到的是:

xor ax,ax
mov ax, 0
and ax, 0
Run Code Online (Sandbox Code Playgroud)

GJ.*_*GJ. 13

在IA32下如何将0移入ax中有很多可能性...

    lea eax, [0]
    mov eax, 0FFFF0000h         //All constants form 0..0FFFFh << 16
    shr eax, 16                 //All constants form 16..31
    shl eax, 16                 //All constants form 16..31
Run Code Online (Sandbox Code Playgroud)

也许最奇怪的...... :)

@movzx:
    movzx eax, byte ptr[@movzx + 6]   //Because the last byte of this instruction is 0
Run Code Online (Sandbox Code Playgroud)

和...

  @movzx:
    movzx ax, byte ptr[@movzx + 7]
Run Code Online (Sandbox Code Playgroud)

编辑:

对于16位x86 cpu模式,未经测试......:

    lea  ax, [0]
Run Code Online (Sandbox Code Playgroud)

和...

  @movzx:
    movzx ax, byte ptr cs:[@movzx + 7]   //Check if 7 is right offset
Run Code Online (Sandbox Code Playgroud)

如果ds段寄存器不等于cs段寄存器,则cs:前缀是可选的.


Pet*_*des 7

请参阅此答案,了解零寄存器的最佳方法:(xor eax,eax性能优势和较小的编码).


我将只考虑单个指令可以将寄存器归零的方式.如果允许从内存中加载零,有太多方法,所以我们主要排除从内存加载的指令.

我找到了10个不同的单个指令,它们将32位寄存器归零(因此在长模式下为完整的64位寄存器),没有任何其他存储器的前置条件或负载.这不计算相同insn的不同编码,或不同形式的mov.如果你计算从已知存储零的存储器加载,或者从段寄存器或其他任何东西加载,那么就有很多方法.零矢量寄存器也有很多种方法.

对于大多数这些版本,eax和rax版本是针对相同功能的单独编码,它们都将整个64位寄存器归零,或者隐式将上半部分归零,或者使用REX.W前缀显式写入完整寄存器.

整数寄存器:

# Works on any reg unless noted, usually of any size.  eax/ax/al as placeholders
and    eax, 0         ; three encodings: imm8, imm32, and eax-only imm32
andn   eax, eax,eax   ; BMI1 instruction set: dest = ~s1 & s2
imul   eax, any,0     ; eax = something * 0.  two encodings: imm8, imm32
lea    eax, [0]       ; absolute encoding (disp32 with no base or index).  Use [abs 0] in NASM if you used DEFAULT REL
lea    eax, [rel 0]   ; YASM supports this, but NASM doesn't: use a RIP-relative encoding to address a specific absolute address, making position-dependent code

mov    eax, 0         ; 5 bytes to encode (B8 imm32)
mov    rax, strict dword 0   ; 7 bytes: REX mov r/m64, sign-extended-imm32.    NASM optimizes mov rax,0 to the 5B version, but dword or strict dword stops it for some reason
mov    rax, strict qword 0   ; 10 bytes to encode (REX B8 imm64).  movabs mnemonic for AT&T.  normally assemblers choose smaller encodings if the operand fits, but strict qword forces the imm64.

sub    eax, eax         ; recognized as a zeroing idiom on some but maybe not all CPUs
xor    eax, eax         ; Preferred idiom: recognized on all CPUs

@movzx:
  movzx eax, byte ptr[@movzx + 6]   //Because the last byte of this instruction is 0.  neat hack from GJ.'s answer

.l: loop .l             ; clears e/rcx... eventually.  from I. J. Kennedy's answer.  To operate on only ECX, use an address-size prefix.
; rep lodsb             ; not counted because it's not safe (potential segfaults), but also zeros ecx
Run Code Online (Sandbox Code Playgroud)

"将所有位移出一端"对于常规大小的GP寄存器是不可能的,只有部分寄存器是不可能的. shlshr班次计数被掩盖:count &= 31;相当于count %= 32;.(但是286和更早版本只有16位,因此ax是一个"完整"寄存器.shr r/m16, imm8指令的可变计数形式被添加286,所以有一些CPU可以将一个移位归零整个寄存器.)

另请注意,向量的移位计数饱和而不是换行.

# Zeroing methods that only work on 16bit or 8bit regs:
shl    ax, 16           ; shift count is still masked to 0x1F for any operand size less than 64b.  i.e. count %= 32
shr    al, 16           ; so 8b and 16b shifts can zero registers.

# zeroing ah/bh/ch/dh:  Low byte of the reg = whatever garbage was in the high16 reg
movxz  eax, ah          ; From Jerry Coffin's answer
Run Code Online (Sandbox Code Playgroud)

取决于其他现有条件(除了在另一个注册表中为零):

bextr  eax,  any, eax  ; if al >= 32, or ah = 0.  BMI1
BLSR   eax,  src       ; if src only has one set bit
CDQ                    ; edx = sign-extend(eax)
sbb    eax, eax        ; if CF=0.  (Only recognized on AMD CPUs as dependent only on flags (not eax))
setcc  al              ; with a condition that will produce a zero based on known state of flags

PSHUFB   xmm0, all-ones  ; xmm0 bytes are cleared when the mask bytes have their high bit set
Run Code Online (Sandbox Code Playgroud)

矢量注册:

其中一些SSE2整数指令也可用于MMX寄存器(mm0- mm7).同样,最好的选择是某种形式的xor.要么PXOR/ VPXORXORPS/ VXORPS.

AVX vxorps xmm0,xmm0,xmm0将整个ymm0/zmm0归零,并且优于vxorps ymm0,ymm0,ymm0AMD CPU.这些归零指令有三种编码:传统SSE,AVX(VEX前缀)和AVX512(EVEX前缀),尽管SSE版本仅将底部128归零,这不是支持AVX或AVX512的CPU上的完整寄存器.无论如何,根据你的计算方式,每个条目可以是三个不同的指令(相同的操作码,只是不同的前缀).除此之外vzeroall,AVX512没有改变(并且没有zmm16-31为零).

ANDNPD    xmm0, xmm0
ANDNPS    xmm0, xmm0
PANDN     xmm0, xmm0     ; dest = ~dest & src

PCMPGTB   xmm0, xmm0     ; n > n is always false.
PCMPGTW   xmm0, xmm0     ; similarly, pcmpeqd is a good way to do _mm_set1_epi32(-1)
PCMPGTD   xmm0, xmm0
PCMPGTQ   xmm0, xmm0     ; SSE4.2, and slower than byte/word/dword


PSADBW    xmm0, xmm0     ; sum of absolute differences
MPSADBW   xmm0, xmm0, 0  ; SSE4.1.  sum of absolute differences, register against itself with no offset.  (imm8=0: same as PSADBW)

  ; shift-counts saturate and zero the reg, unlike for GP-register shifts
PSLLDQ    xmm0, 16       ;  left-shift the bytes in xmm0
PSRLDQ    xmm0, 16       ; right-shift the bytes in xmm0
PSLLW     xmm0, 16       ; left-shift the bits in each word
PSLLD     xmm0, 32       ;           double-word
PSLLQ     xmm0, 64       ;             quad-word
PSRLW/PSRLD/PSRLQ  ; same but right shift

PSUBB/W/D/Q   xmm0, xmm0     ; subtract packed elements, byte/word/dword/qword
PSUBSB/W   xmm0, xmm0     ; sub with signed saturation
PSUBUSB/W  xmm0, xmm0     ; sub with unsigned saturation

PXOR       xmm0, xmm0
XORPD      xmm0, xmm0
XORPS      xmm0, xmm0

VZEROALL

# Can raise an exception on SNaN, so only usable if you know exceptions are masked
CMPLTPD    xmm0, xmm0         # exception on QNaN or SNaN, or denormal
VCMPLT_OQPD xmm0, xmm0,xmm0   # exception only on SNaN or denormal
CMPLT_OQPS ditto

VCMPFALSE_OQPD xmm0, xmm0, xmm0   # This is really just another imm8 predicate value fro the same VCMPPD xmm,xmm,xmm, imm8 instruction.  Same exception behaviour as LT_OQ.
Run Code Online (Sandbox Code Playgroud)

SUBPS xmm0, xmm0 和类似的东西不起作用,因为NaN-NaN = NaN,而不是零.

此外,FP指令可以引发NaN参数的异常,因此即使您知道掩码异常,CMPPS/PD也是安全的,并且您不关心可能在MXCSR中设置异常位.即使是AVX版本,随着谓词的扩展选择,也会#IA在SNaN上升级."安静"的谓词只能抑制#IAQNaN.CMPPS/PD也可以引发Denormal异常.

(请参阅insn set refact entry for CMPPD中的表格,或者最好是英特尔原始PDF格式,因为HTML摘录会破坏该表格.)

AVX512:

这里可能有几种选择,但我现在还不够好奇,要深入挖掘指令集列表,寻找所有这些选项.

但有一个值得一提的有趣的是:VPTERNLOGD/Q可以将寄存器设置为all-1,而imm8 = 0xFF.(但在当前实现上对旧值具有错误的依赖性).由于比较指令都比较为掩码,在我的测试中,VPTERNLOGD似乎是在Skylake-AVX512上将矢量设置为全1的最佳方法,尽管不特殊情况下imm8 = 0xFF情况以避免错误依赖.

VPTERNLOGD zmm0, zmm0,zmm0, 0     ; inputs can be any registers you like.
Run Code Online (Sandbox Code Playgroud)

x87 FP:

只有一个选择(因为如果旧值是无穷大或NaN,则sub不起作用).

FLDZ    ; push +0.0
Run Code Online (Sandbox Code Playgroud)