C++:计算给定范围内可能的浮点值的数量

Gif*_*guy 4 c++ algorithm floating-point comparison standards

我正在使用Crypto ++开发加密应用程序
作为此应用程序的一个不起眼的部分,我需要确定在某个数值范围内可以存在的唯一浮点值的最大数量.

显然,有无限之间的数字01现实-但不是所有的人都可以通过一个唯一的浮点值来表示.

我有一个最小浮点值和一个最大浮点值.
我需要确定此范围内可能的浮点值的数量.

这很棘手,因为浮点值间隔更远,距离越远0.

例如,0和之间的可能浮点值1的数量与100,000和之间的浮点值的数量非常不同100,001

出于我的目的,我希望计数也包括最小值和最大值.
但是,产生独占计数的算法同样有用,因为我可以简单地添加12根据需要添加.

额外关注:
如果0在范围内怎么办?
例如,如果最小值是-2.0,并且最大值是正2.0,我不想计数0两次(一次用于0,再次用于-0).
另外,如果最小值或最大值是+/-无穷大会出现什么问题?
(如果最小值或最大值是NaN,我可能会抛出异常).

uint32_t RangeValueCount ( float fMin , float fMax )
{
    if ( fMin > fMax )
        swap ( fMin , fMax ) ;  // Ensure fMin <= fMax

    // Calculate the number of possible floating-point values between fMin and fMax.

    return ( *reinterpret_cast < uint32_t* > ( &fMax ) -
             *reinterpret_cast < uint32_t* > ( &fMin ) ) + 1 ;

    // This algorithm is obviously unsafe, assumes IEEE 754
    // How should I account for -0 or infinity?
}
Run Code Online (Sandbox Code Playgroud)

如果这个问题可以解决,我认为解决方案同样适用于double值(可能是long double值,但由于80位整数值等原因可能会稍微复杂一些)

Eri*_*hil 5

这是处理所有有限数字的代码.它期望IEEE 754算术.我用更简单,更清晰的代码替换了我以前的版本.这不是两次距离计算的实现,而是有两种将浮点数转换为其编码的实现(一种是通过复制位,一种是通过数学方式操作它).之后,距离计算相当简单(必须调整负值,然后距离只是减法).

#include <ctgmath>
#include <cstdint>
#include <cstdlib>
#include <iostream>
#include <limits>


typedef double Float;       //  The floating-point type to use.
typedef std::uint64_t UInt; //  Unsigned integer of same size as Float.


/*  Define a value with only the high bit of a UInt set.  This is also the
    encoding of floating-point -0.
*/
static constexpr UInt HighBit
    = std::numeric_limits<UInt>::max() ^ std::numeric_limits<UInt>::max() >> 1;


//  Return the encoding of a floating-point number by copying its bits.
static UInt EncodingBits(Float x)
{
    UInt result;
    std::memcpy(&result, &x, sizeof result);
    return result;
}


//  Return the encoding of a floating-point number by using math.
static UInt EncodingMath(Float x)
{
    static constexpr int SignificandBits = std::numeric_limits<Float>::digits;
    static constexpr int MinimumExponent = std::numeric_limits<Float>::min_exponent;

    //  Encode the high bit.
    UInt result = std::signbit(x) ? HighBit : 0;

    //  If the value is zero, the remaining bits are zero, so we are done.
    if (x == 0) return result;

    /*  The C library provides a little-known routine to split a floating-point
        number into a significand and an exponent.  Note that this produces a
        normalized significand, not the actual significand encoding.  Notably,
        it brings significands of subnormals up to at least 1/2.  We will
        adjust for that below.  Also, this routine normalizes to [1/2, 1),
        whereas IEEE 754 is usually expressed with [1, 2), but that does not
        bother us.
    */
    int xe;
    Float xf = std::frexp(fabs(x), &xe);

    //  Test whether the number is subnormal.
    if (xe < MinimumExponent)
    {
        /*  For a subnormal value, the exponent encoding is zero, so we only
            have to insert the significand bits.  This scales the significand
            so that its low bit is scaled to the 1 position and then inserts it
            into the encoding.
        */
        result |= (UInt) std::ldexp(xf, xe - MinimumExponent + SignificandBits);
    }
    else
    {
        /*  For a normal value, the significand is encoded without its leading
            bit.  So we subtract .5 to remove that bit and then scale the
            significand so its low bit is scaled to the 1 position.
        */
        result |= (UInt) std::ldexp(xf - .5, SignificandBits);

        /*  The exponent is encoded with a bias of (in C++'s terminology)
            MinimumExponent - 1.  So we subtract that to get the exponent
            encoding and then shift it to the position of the exponent field.
            Then we insert it into the encoding.
        */
        result |= ((UInt) xe - MinimumExponent + 1) << (SignificandBits-1);
    }

    return result;
}


/*  Return the encoding of a floating-point number.  For illustration, we
    get the encoding with two different methods and compare the results.
*/
static UInt Encoding(Float x)
{
    UInt xb = EncodingBits(x);
    UInt xm = EncodingMath(x);

    if (xb != xm)
    {
        std::cerr << "Internal error encoding" << x << ".\n";
        std::cerr << "\tEncodingBits says " << xb << ".\n";
        std::cerr << "\tEncodingMath says " << xm << ".\n";
        std::exit(EXIT_FAILURE);
    }

    return xb;
}


/*  Return the distance from a to b as the number of values representable in
    Float from one to the other.  b must be greater than or equal to a.  0 is
    counted only once.
*/
static UInt Distance(Float a, Float b)
{
    UInt ae = Encoding(a);
    UInt be = Encoding(b);

    /*  For represented values from +0 to infinity, the IEEE 754 binary
        floating-points are in ascending order and are consecutive.  So we can
        simply subtract two encodings to get the number of representable values
        between them (including one endpoint but not the other).

        Unfortunately, the negative numbers are not adjacent and run the other
        direction.  To deal with this, if the number is negative, we transform
        its encoding by subtracting from the encoding of -0.  This gives us a
        consecutive sequence of encodings from the greatest magnitude finite
        negative number to the greatest finite number, in ascending order
        except for wrapping at the maximum UInt value.

        Note that this also maps the encoding of -0 to 0 (the encoding of +0),
        so the two zeroes become one point, so they are counted only once.
    */
    if (HighBit & ae) ae = HighBit - ae;
    if (HighBit & be) be = HighBit - be;

    //  Return the distance between the two transformed encodings.
    return be - ae;
}


static void Try(Float a, Float b)
{
    std::cout << "[" << a << ", " << b << "] contains "
        << Distance(a,b) + 1 << " representable values.\n";
}


int main(void)
{
    if (sizeof(Float) != sizeof(UInt))
    {
        std::cerr << "Error, UInt must be an unsigned integer the same size as Float.\n";
        std::exit(EXIT_FAILURE);
    }

    /*  Prepare some test values:  smallest positive (subnormal) value, largest
        subnormal value, smallest normal value.
    */
    Float S1 = std::numeric_limits<Float>::denorm_min();
    Float N1 = std::numeric_limits<Float>::min();
    Float S2 = N1 - S1;

    //  Test 0 <= a <= b.
    Try( 0,  0);
    Try( 0, S1);
    Try( 0, S2);
    Try( 0, N1);
    Try( 0, 1./3);
    Try(S1, S1);
    Try(S1, S2);
    Try(S1, N1);
    Try(S1, 1./3);
    Try(S2, S2);
    Try(S2, N1);
    Try(S2, 1./3);
    Try(N1, N1);
    Try(N1, 1./3);

    //  Test a <= b <= 0.
    Try(-0., -0.);
    Try(-S1, -0.);
    Try(-S2, -0.);
    Try(-N1, -0.);
    Try(-1./3, -0.);
    Try(-S1, -S1);
    Try(-S2, -S1);
    Try(-N1, -S1);
    Try(-1./3, -S1);
    Try(-S2, -S2);
    Try(-N1, -S2);
    Try(-1./3, -S2);
    Try(-N1, -N1);
    Try(-1./3, -N1);

    //  Test a <= 0 <= b.
    Try(-0., +0.);
    Try(-0., S1);
    Try(-0., S2);
    Try(-0., N1);
    Try(-0., 1./3);
    Try(-S1, +0.);
    Try(-S1, S1);
    Try(-S1, S2);
    Try(-S1, N1);
    Try(-S1, 1./3);
    Try(-S2, +0.);
    Try(-S2, S1);
    Try(-S2, S2);
    Try(-S2, N1);
    Try(-S2, 1./3);
    Try(-N1, +0.);
    Try(-N1, S1);
    Try(-N1, S2);
    Try(-N1, N1);
    Try(-1./3, 1./3);
    Try(-1./3, +0.);
    Try(-1./3, S1);
    Try(-1./3, S2);
    Try(-1./3, N1);
    Try(-1./3, 1./3);

    return 0;
}
Run Code Online (Sandbox Code Playgroud)


Adv*_*ere -1

std::nexttoward(from_starting, to_end);很棘手,您可能可以尝试在循环中使用 并计数直到结束。我自己没有尝试过,需要很长时间才能完成。如果执行此操作,请确保检查错误标志,请参阅:http ://en.cppreference.com/w/cpp/numeric/math/nextafter

  • 为了了解可能需要多长时间,我尝试使用映射到 IEEE-754 `binary32` 的 `float` 从 0 计数到 1.0,在 3.5 GHz x86 处理器(使用 Intel C/C++ 13.x 编译)上花费了 7.6 秒,`/Ox /QxHOST /fp:strict`)。 (4认同)
  • 这不是一个实际的答案,对于 64 位浮点来说肯定不可行。 (2认同)
  • 完全同意是不切实际的。不知道如何以有效且准确的方式做到这一点。 (2认同)