浮点数和字符串转换的奇怪行为

Question

浮点数和字符串转换的奇怪行为

Mar*_*oma 18 python floating-point python-2.x

我把它键入python shell:

>>> 0.1*0.1
0.010000000000000002

Run Code Online (Sandbox Code Playgroud)

我预计0.1*0.1不是0.01,因为我知道基数10中的0.1在基数2中是周期性的.

>>> len(str(0.1*0.1))
4

Run Code Online (Sandbox Code Playgroud)

因为我看过20个字符,所以我预计会得到20分.我为什么得到4？

>>> str(0.1*0.1)
'0.01'

Run Code Online (Sandbox Code Playgroud)

好吧,这解释了为什么我len给了我4,但为什么要str回来'0.01'？

>>> repr(0.1*0.1)
'0.010000000000000002'

Run Code Online (Sandbox Code Playgroud)

为什么str圆而repr不是？(我已经阅读了这个答案,但我想知道他们如何决定何时str轮流浮动而不是什么时候)

>>> str(0.01) == str(0.0100000000001)
False
>>> str(0.01) == str(0.01000000000001)
True

Run Code Online (Sandbox Code Playgroud)

所以它似乎是浮子准确性的问题.我以为Python会使用IEEE 754单精度浮点数.所以我这样检查过:

#include <stdint.h>
#include <stdio.h> // printf

union myUnion {
    uint32_t i; // unsigned integer 32-bit type (on every machine)
    float f;    // a type you want to play with
};

int main() {
    union myUnion testVar;
    testVar.f = 0.01000000000001f;
    printf("%f\n", testVar.f);

    testVar.f = 0.01000000000000002f;
    printf("%f\n", testVar.f);

    testVar.f = 0.01f*0.01f;
    printf("%f\n", testVar.f);
}

Run Code Online (Sandbox Code Playgroud)

我有:

0.010000
0.010000
0.000100

Run Code Online (Sandbox Code Playgroud)

Python给了我:

>>> 0.01000000000001
0.010000000000009999
>>> 0.01000000000000002
0.010000000000000019
>>> 0.01*0.01
0.0001

Run Code Online (Sandbox Code Playgroud)

为什么Python会给我这些结果？

(我使用Python 2.6.5.如果你知道Python版本的差异,我也会对它们感兴趣.)

Answer 1

eca*_*mur 16

关键要求repr是它应该往返; 也就是说,eval(repr(f)) == f应该True在所有情况下给予.

在Python 2.x(2.7之前)中,repr通过printf使用格式%.17g并丢弃尾随零来工作.这保证了IEEE-754的正确(对于64位浮点数).从2.7和3.1开始,Python使用更智能的算法,在某些情况下可以找到更短的表示,其中%.17g给出了不必要的非零终端数字或终端9.请参阅3.1中的新功能？并发出1580.

即使在Python 2.7下,也repr(0.1 * 0.1)给出了"0.010000000000000002".这是因为0.1 * 0.1 == 0.01是False在IEEE-754解析和算法; 也就是说0.1,当与其自身相乘时,最接近的64位浮点值产生一个64位浮点值,该值不是最接近的64位浮点值0.01:

>>> 0.1.hex()
'0x1.999999999999ap-4'
>>> (0.1 * 0.1).hex()
'0x1.47ae147ae147cp-7'
>>> 0.01.hex()
'0x1.47ae147ae147bp-7'
                 ^ 1 ulp difference

Run Code Online (Sandbox Code Playgroud)

repr和str(2.7/3.1之前)之间的差异是str具有12个小数位的格式而不是17个,这是非圆形的,但在许多情况下产生更可读的结果.

Answer 2

Kat*_*iel 5

我可以确认你的行为

ActivePython 2.6.4.10 (ActiveState Software Inc.) based on
Python 2.6.4 (r264:75706, Jan 22 2010, 17:24:21) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> repr(0.1)
'0.10000000000000001'
>>> repr(0.01)
'0.01'

Run Code Online (Sandbox Code Playgroud)

现在,文档声称在Python <2.7

repr(1.1)计算的值为format(1.1, '.17g')

这是一个小小的简化.

请注意,这与字符串格式有关代码有关 - 在内存中,所有Python浮点数都只是存储为C++双精度数,因此它们之间永远不会有区别.

而且,即使你知道有一个更好的字符串,使用浮动的全长字符串也是一种令人不愉快的事情.实际上,在现代Pythons中,一种新算法用于浮点格式化,以智能方式选择最短的表示.

我花了一些时间在源代码中查找,所以我会在这里包含详细信息,以防您感兴趣.您可以跳过此部分.

在 floatobject.c,我们看到

static PyObject *
float_repr(PyFloatObject *v)
{
    char buf[100];
    format_float(buf, sizeof(buf), v, PREC_REPR);

    return PyString_FromString(buf);
}

Run Code Online (Sandbox Code Playgroud)

这让我们看看 format_float.省略NaN/inf特殊情况,它是:

format_float(char *buf, size_t buflen, PyFloatObject *v, int precision)
{
    register char *cp;
    char format[32];
    int i;

    /* Subroutine for float_repr and float_print.
       We want float numbers to be recognizable as such,
       i.e., they should contain a decimal point or an exponent.
       However, %g may print the number as an integer;
       in such cases, we append ".0" to the string. */

    assert(PyFloat_Check(v));
    PyOS_snprintf(format, 32, "%%.%ig", precision);
    PyOS_ascii_formatd(buf, buflen, format, v->ob_fval);
    cp = buf;
    if (*cp == '-')
        cp++;
    for (; *cp != '\0'; cp++) {
        /* Any non-digit means it's not an integer;
           this takes care of NAN and INF as well. */
        if (!isdigit(Py_CHARMASK(*cp)))
            break;
    }
    if (*cp == '\0') {
        *cp++ = '.';
        *cp++ = '0';
        *cp++ = '\0';
        return;
    }

    <some NaN/inf stuff>
}

Run Code Online (Sandbox Code Playgroud)

我们可以看到

所以这首先初始化一些变量并检查它 v一个格式良好的浮点数.然后它准备一个格式字符串:

PyOS_snprintf(format, 32, "%%.%ig", precision);

Run Code Online (Sandbox Code Playgroud)

现在PREC_REPR在其他地方被定义floatobject.c为17,所以这计算到"%.17g".现在我们打电话

PyOS_ascii_formatd(buf, buflen, format, v->ob_fval);

Run Code Online (Sandbox Code Playgroud)

随着隧道的尽头,我们抬头PyOS_ascii_formatd发现它在snprintf内部使用.

归档时间：	13 年前
查看次数：	2639 次
最近记录：	13 年前