使用x87 FPU将浮点转换为带截断的整数,而不是舍入

Question

使用x87 FPU将浮点转换为带截断的整数,而不是舍入

Zac*_*son 5 assembly nasm x87

所述FISTP指令改变0.75至1(因为四舍五入)

我希望0.75变成0而不是1.

是否有FIST/FISTP的替代方法可以截断而不是舍入？

Answer 1

Cod*_*ray 6

你真的有很多选择:

如果您仍在使用SSE2指令,则可以使用SSE2指令将浮点值转换为带截断的整数值.Peter Cordes的回答讨论了这种方法.CVTTSD2SI是标量版本,CVTTPD2DQ是打包/矢量版本.

如果您的目标是x86-64,那么SSE2将始终可用,这就是您应该用于所有浮点运算的内容.x86 FPU在x86-64上完全过时了.

如果您在Pentium 4或Athlon 64之前定位x86-32处理器,则SSE2指令将不可用.在这种情况下,SSE指令可能仍然可用(Pentium 3,Athlon XP及更高版本支持SSE).SSE仅支持单精度浮点运算,因此如果不需要精度,则可以使用CVTTSS2SI(标量)或CVTTPS2DQ(压缩/向量).不幸的是,你经常需要精确度; 请参阅下面的更好的解决方法.
如果SSE3指令可用(Pentium 4 Prescott,某些Athlon 64及更高版本),那么您可以使用该FISTTP指令FISTP,除非它总是截断,无论当前的舍入模式如何.这是fuz的答案所呈现的解决方案.

如果您已经在使用x87 FPU,这是一个非常好的解决方案,但适用性有限,因为如果您的目标是支持SSE3的芯片,它们必然支持SSE2,因此您应该使用SSE指令来完成所有浮点运算操纵.唯一的例外是,如果您真的需要x87 FPU提供的扩展80位精度用于中间计算(SSE2限制为64位双精度).
如果您坚持使用传统的x86-32处理器并使用不带SSE 的x87 FPU,那么您仍然没有选择.有几种快速的比特方法.这些不是我最初的创新 - 代码散布在互联网的各个地方,我只是稍微整理和调整它们,所以我不能完全信任,也不能引用特定的来源.这是一个这样的来源.

对于单精度浮点值,整个位表示适合32位寄存器,因此实现很简单(假设要截断的浮点值位于x87 FPU堆栈的顶部):
```
; Retrieve the bit representation of the original floating-point value.
push  eax
fst   DWORD PTR [esp]
mov   eax, DWORD PTR [esp]

; Twiddle those raw bits.
and   eax, 080000000H
xor   eax, 0BEFFFFFFH

; Store those manipulated bits back in memory, since we can't load        
; directly from a register to the x87 FPU stack.
mov   DWORD PTR [esp], eax

; Add the modified value to the original value at the top of the stack.
fadd  DWORD PTR [esp]

; Round the adjusted floating-point value to an integer.
; (Our bit manipulation ensures that this will always truncate,
; regardless of the current rounding mode.)
fistp DWORD PTR [esp]

; ... do something with the result in ESP

pop   eax
```
Run Code Online (Sandbox Code Playgroud)
另一种实现使用静态数组"调整"值,我们根据原始浮点值的"符号"将其编入索引.这基本上是用C编写的一个天真的"truncate"函数,除了它无分支地执行:
```
const uint32_t kSingleAdjustments[2] = { 0xBEFFFFFF,  /* -0.49999997f */
                                         0x3EFFFFFF   /* +0.49999997f */ };
```
Run Code Online (Sandbox Code Playgroud)
```
; Retrieve the bit representation of the floating-point value.
push  eax
fst   DWORD PTR [esp]
mov   eax, DWORD PTR [esp]

; Isolate the sign bit.
shr   eax, 31

; Use the sign bit as an index into the array of values to add the appropriate
; adjustment value to the original floating-point value at the top of the stack.
; (NOTE: This syntax is for MSVC's inline asm; translate as necessary.)
fadd  DWORD PTR [kSingleAdjustments + (eax * TYPE kSingleAdjustments)]

; Round the adjusted floating-point value to an integer.
; (Our adjustment ensures that it will be truncated, regardless of rounding mode.)
fistp DWORD PTR [esp]

; ... do something with the result in ESP

pop   eax
```
Run Code Online (Sandbox Code Playgroud)
我的基准测试表明,第二种变体在英特尔处理器上更快,但在AMD(特别是Athlon XP和Athlon 64)上更慢.我最终确定了我的库的方法#2,特别是因为我重新使用"调整"值来实现其他类型的快速舍入.

请注意,最后一条FISTP指令支持两者m32和m64操作数,因此如果要截断为64位整数以获得更高的精度,那么这是可能的.只记得在堆栈上分配两倍的空间,然后使用fistp QWORD PTR, [esp]而不是fistp DWORD PTR, [esp].

我意识到这一切看起来都很复杂,但这确实比调整舍入模式,进行舍入以及设置舍入模式要快得多.我已经在各种处理器和各种代码路径上对它进行了广泛的基准测试,并且从未发现它变慢.但我在C代码中使用它,标准需要编译器发出恢复舍入模式的代码.如果您正在手动编写程序集,并且需要截断,只需将FPU的舍入模式切换为"截断"一次,然后将其保留.

这个bit-twiddling代码也有双精度版本.关键是要意识到符号位位于64位双精度的高32位,因此您仍然只需要一个32位寄存器.

但是,双精度版本不是没有错误的!非常接近整数的浮点值将向上舍入到最接近的整数,而不是被截断(例如,4.99999977被错误地舍入为5,而不是被截断为4).比我更聪明,有更多时间玩这个可能会找到解决这个问题的方法,但是在大多数情况下我对这种准确性感到满意,特别是考虑到速度的大幅提升.
```
const uint64_t kDoubleAdjustments[2] = { 0xBFDFFFFF00000000,
                                         0x3FDFFFFF00000000 };
```
Run Code Online (Sandbox Code Playgroud)
```
sub   esp, 8
fst   QWORD PTR [esp]
mov   eax, DWORD PTR [esp+4]   ; we only need the upper 32 bits

shr   eax, 31
fadd  QWORD PTR [kDoubleAdjustments + (eax * TYPE kDoubleAdjustments)]

fistp DWORD PTR [esp]

; ... do something with the result in ESP

add   esp, 8
```
Run Code Online (Sandbox Code Playgroud)

Answer 2

fuz*_*fuz 4

SSE3指令集也引入了该fisttp指令。它的工作原理与fistp指令类似，可以将浮点数存储为 32 位整数（在进程中弹出堆栈），但无论当前舍入模式如何，它总是截断该值。

\n\n

以下是如何使用它的示例：

\n\n

FLD    QWORD PTR [esi] ; load 64 bit floating point number\nFISTTP DWORD PTR [edi] ; truncate and store as 32 bit integer\n

Run Code Online (Sandbox Code Playgroud)\n\n

或 AT&T 语法：

\n\n

fldl    (%esi)\nfisttpl (%edi)\n

Run Code Online (Sandbox Code Playgroud)\n\n

如果您没有支持 SSE3 的处理器，fistp则在确保舍入模式设置为 \xe2\x80\x9ctruncate.\xe2\x80\x9d 后，可以使用该指令获得类似的结果

\n\n

sub    esp,0x4               ; make space for the control word\nfstcw  WORD PTR [esp]        ; store the FPU control word\nfstcw  WORD PTR [esp+0x2]    ; store another copy\nor     WORD PTR [esp],0x0c00 ; set rounding mode to "truncate"\nfldcw  WORD PTR [esp]        ; load updated control word\nfld    QWORD PTR [esi]       ; load floating point number\nfistp  WORD PTR [edi]        ; truncate to integer\nfldcw  WORD PTR [esp+0x2]    ; restore control word\n

Run Code Online (Sandbox Code Playgroud)\n\n

或 AT&T 语法：

\n\n

sub $4,%esp\nfstcw (%esp)\nfstcw 2(%esp)\norw $0x0c00,(%esp)\nfldcw (%esp)\nfldl (%esi) \nfistp (%edi)\nfldcw 2(%esp)\n

Run Code Online (Sandbox Code Playgroud)\n\n

如果您的代码不打算在 80286 或更早版本上运行，您可能希望使用fnstcw而不是为fstcw 每条指令保存一个字节，但代价是代码可能无法在真正的 8087 上运行。

\n

归档时间：	9 年前
查看次数：	2146 次
最近记录：	9 年前