在C ++中如何定义从整数到浮点的精度损失?

use*_*183 9 c++ floating-point rounding static-cast

我对以下代码段有疑问:

long l=9223372036854775807L;
float f=static_cast<float>(l);
Run Code Online (Sandbox Code Playgroud)

根据IEEE754,不能完全表示long值。

我的问题是如何处理有损转换:

  1. 是否采用最近的浮点表示形式?
  2. 是否会采用下一个较小/较大的表示形式?
  3. 还是采用其他方法?

我知道这个问题 ,将int转换为float时在后台会发生什么,但这不会解决我的问题。

Bla*_*aze 6

这里

整数或无作用域枚举类型的prvalue可以转换为任何浮点类型的prvalue。如果不能正确表示该值,则由实现方式定义是选择最接近的较高值还是最接近的较低可表示值,尽管如果支持IEEE算术,则四舍五入默认为最近值。如果该值不能适合目标类型,则行为是不确定的。如果源类型为bool,则将值false转换为零,将值true转换为1。

至于IEEE 754的舍入规则,似乎有五个。但是,我找不到有关在哪种情况下使用哪种信息的任何信息。看起来这取决于实现,但是,您可以按如下所述 C ++程序中设置舍入模式。


eer*_*ika 6

C ++这样定义转换(引用最新的标准草案):

[conf.fpint]

A prvalue of an integer type or of an unscoped enumeration type can be converted to a prvalue of a floating-point type. The result is exact if possible. If the value being converted is in the range of values that can be represented but the value cannot be represented exactly, it is an implementation-defined choice of either the next lower or higher representable value. [ Note: Loss of precision occurs if the integral value cannot be represented exactly as a value of the floating-point type. — end note ] If the value being converted is outside the range of values that can be represented, the behavior is undefined. If the source type is bool, the value false is converted to zero and the value true is converted to one.

The IEEE 754 standard defines conversion like this:

5.4.1 Arithmetic operations

It shall be possible to convert from all supported signed and unsigned integer formats to all supported arithmetic formats. Integral values are converted exactly from integer formats to floating-point formats whenever the value is representable in both formats. If the converted value is not exactly representable in the destination format, the result is determined according to the applicable rounding-direction attribute, and an inexact or floating-point overflow exception arises as specified in Clause 7, just as with arithmetic operations. The signs of integer zeros are preserved. Integer zeros without signs are converted to +0. The preferred exponent is 0.

Rounding modes are specified as:

4.3.1 Rounding-direction attributes to nearest

  • roundTiesToEven, the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with an even least significant digit shall be delivered.

  • roundTiesToAway, the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with larger magnitude shall be delivered.

4.3.2 Directed rounding attributes

  • roundTowardPositive, the result shall be the format’s floating-point number (possibly +?) closest to and no less than the infinitely precise result

  • roundTowardNegative, the result shall be the format’s floating-point number (possibly ??) closest to and no greater than the infinitely precise result

  • roundTowardZero, the result shall be the format’s floating-point number closest to and no greater in magnitude than the infinitely precise result.

4.3.3 Rounding attribute requirements

The roundTiesToEven rounding-direction attribute shall be the default rounding-direction attribute for results in binary formats.

So by default, your suggestion 1 would apply, but only if another mode hasn't been selected.


The C++ standard library inherits <cfenv> from the C standard. This header offers macros, functions and types for interacting with the floating point environment, including the rounding modes.