用 10 条或更少的指令实现 tanh(x) 的最佳非三角浮点近似

hid*_*kgb 16 algorithm math floating-point ieee-754 approximation

描述

\n

对于没有内置浮点三角学的机器,我需要一个相当准确的快速双曲正切,因此例如通常的tanh(x) = (exp(2x) - 1) / (exp(2x) + 1)公式将需要近似值exp(2x)
\n所有其他指令,如加法、减法、乘法、除法,甚至 FMA(= 1 次操作中的 MUL+ADD)都存在。

\n

现在我有几个近似值,但没有一个在准确性方面令人满意。

\n

[评论更新:]

\n
    \n
  • trunc()/的说明floor()可用
  • \n
  • 有一种方法可以透明地将浮点数重新解释为整数并执行各种位操作
  • \n
  • 有一系列称为 SEL.xx(.GT、.LE 等)的指令,它们比较 2 个值并选择要写入目标的内容
  • \n
  • DIV 慢两倍,所以没有什么异常,DIV 可以使用
  • \n
\n

方法一

\n

\n

精度:\xc2\xb11.2% 绝对误差,请参见此处

\n

伪代码(A = 累加器寄存器,T = 临时寄存器):

\n
[1] FMA T, 36.f / 73.f, A, A   // T := 36/73 + X^2\n[2] MUL A, A, T                // A := X(36/73 + X^2)\n[3] ABS T, A                   // T := |X(36/73 + X^2)|\n[4] ADD T, T, 32.f / 73.f      // T := |X(36/73 + X^2)| + 32/73\n[5] DIV A, A, T                // A := X(36/73 + X^2) / (|X(36/73 + X^2)| + 32/73)\n
Run Code Online (Sandbox Code Playgroud)\n

方法2

\n

\n

精度:\xc2\xb10.9% 绝对误差,请参见此处

\n

伪代码(A = 累加器寄存器,T = 临时寄存器):

\n
[1] FMA T, 3.125f, A, A        // T := 3.125 + X^2\n[2] DIV T, 25.125f, T          // T := 25.125/(3.125 + X^2)\n[3] MUL A, A, 0.1073f          // A := 0.1073*X\n[4] FMA A, A, A, T             // A := 0.1073*X + 0.1073*X*25.125/(3.125 + X^2)\n[5] MIN A, A, 1.f              // A := min(0.1073*X + 0.1073*X*25.125/(3.125 + X^2), 1)\n[6] MAX A, A, -1.f             // A := max(min(0.1073*X + 0.1073*X*25.125/(3.125 + X^2), 1), -1)\n
Run Code Online (Sandbox Code Playgroud)\n

方法3

\n

\n

精度:\xc2\xb10.13% 绝对误差,请参见此处

\n

伪代码(A = 累加器寄存器,T = 临时寄存器):

\n
[1] FMA T, 14.f, A, A          // T := 14 + X^2\n[2] FMA T, -133.f, T, T        // T := (14 + X^2)^2 - 133\n[3] DIV T, A, T                // T := X/((14 + X^2)^2 - 133)\n[4] FMA A, 52.5f, A, A         // A := 52.5 + X^2\n[5] MUL A, A, RSQRT(15.f)      // A := (52.5 + X^2)/sqrt(15)\n[6] FMA A, -120.75f, A, A      // A := (52.5 + X^2)^2/15 - 120.75\n[7] MUL A, A, T                // A := ((52.5 + X^2)^2/15 - 120.75)*X/((14 + X^2)^2 - 133)\n[8] MIN A, A, 1.f              // A := min(((52.5 + X^2)^2/15 - 120.75)*X/((14 + X^2)^2 - 133), 1)\n[9] MAX A, A, -1.f             // A := max(min(((52.5 + X^2)^2/15 - 120.75)*X/((14 + X^2)^2 - 133), 1), -1)\n
Run Code Online (Sandbox Code Playgroud)\n

问题

\n

有没有更好的东西可以容纳 10 个非三角 float32 指令?

\n

nju*_*ffa 11

经过大量的探索工作,我得出的结论是方法2是最有前途的方向。由于除法在提问者的平台上非常快,有理近似很有吸引力。应积极利用该平台对 FMA 的支持。下面我展示了 C 代码,该代码tanhf()在七次操作中实现了快速,并实现了小于 2.8e-3 的最大绝对误差。

\n

我使用 Remez 算法来计算有理近似的系数,并使用启发式搜索将这些系数减少到尽可能少的位,这可能有利于某些能够将浮点数据合并到常见的直接字段中的处理器架构。使用浮点指令。

\n
#include <stdio.h>\n#include <stdlib.h>\n#include <math.h>\n\n/* Fast computation of hyperbolic tangent. Rational approximation with clamping.\n   Maximum absolute errror = 2.77074604e-3 @ +/-3.29019976\n*/\nfloat fast_tanhf_rat (float x)\n{\n    const float n0 = -8.73291016e-1f; // -0x1.bf2000p-1\n    const float n1 = -2.76107788e-2f; // -0x1.c46000p-6\n    const float d0 =  2.79589844e+0f; //  0x1.65e000p+1\n    float x2 = x * x;\n    float num = fmaf (n0, x2, n1);\n    float den = x2 + d0;\n    float quot = num / den;\n    float res = fmaf (quot, x, x);\n    res = fminf (fmaxf (res, -1.0f), 1.0f);\n    return res;\n}\n\nint main (void)\n{\n    double ref, err, maxerr = 0;\n    float arg, res, maxerrloc = INFINITY;\n    maxerr = 0;\n    arg = 0.0f;\n    while (arg < 0x1.0p64f) {\n        res = fast_tanhf_rat (arg);\n        ref = tanh ((double)arg);\n        err = fabs ((double)res - ref);\n        if (err > maxerr) {\n            maxerr = err;\n            maxerrloc = arg;\n        }\n        arg = nextafterf (arg, INFINITY);\n    }\n    arg = -0.0f;\n    while (arg > -0x1.0p64f) {\n        res = fast_tanhf_rat (arg);\n        ref = tanh ((double)arg);\n        err = fabs ((double)res - ref);\n        if (err > maxerr) {\n            maxerr = err;\n            maxerrloc = arg;\n        }\n        arg = nextafterf (arg, -INFINITY);\n    }\n    printf ("maximum absolute error = %15.8e @ %15.8e\\n", maxerr, maxerrloc);\n    return EXIT_SUCCESS;\n}\n
Run Code Online (Sandbox Code Playgroud)\n

考虑到提问者预算最多十次操作,我们可以将分子和分母多项式的次数增加一,以实现tanhf()包含九次操作的快速实现,其最大绝对误差显着降低,小于 5.8e-5:

\n
#include <stdio.h>\n#include <stdlib.h>\n#include <math.h>\n\n/* Fast computation of hyperbolic tangent. Rational approximation with clamping.\n   Maximum absolute error = 5.77514052e-5 @ +/-=2.22759748\n */\nfloat fast_tanhf_rat2 (float x)\n{\n    const float n0 = -9.49066162e-1f; // -0x1.e5ec00p-1\n    const float n1 = -2.68447266e+1f; // -0x1.ad8400p+4\n    const float n2 = -2.01115608e-2f; // -0x1.498200p-6\n    const float d0 =  3.49853516e+1f; //  0x1.17e200p+5\n    const float d1 =  8.07031250e+1f; //  0x1.42d000p+6\n    float x2 = x * x;\n    float num = fmaf (fmaf (n0, x2, n1), x2, n2);\n    float den = fmaf (x2 + d0, x2, d1);\n    float quot = num / den;\n    float res = fmaf (quot, x, x);\n    res = fminf (fmaxf (res, -1.0f), 1.0f);\n    return res;\n}\n\nint main (void)\n{\n    double ref, err, maxerr = 0;\n    float arg, res, maxerrloc = INFINITY;\n    maxerr = 0;\n    arg = 0.0f;\n    while (arg < 0x1.0p32f) {\n        res = fast_tanhf_rat2 (arg);\n        ref = tanh ((double)arg);\n        err = fabs ((double)res - ref);\n        if (err > maxerr) {\n            maxerr = err;\n            maxerrloc = arg;\n        }\n        arg = nextafterf (arg, INFINITY);\n    }\n    arg = -0.0f;\n    while (arg > -0x1.0p32f) {\n        res = fast_tanhf_rat2 (arg);\n        ref = tanh ((double)arg);\n        err = fabs ((double)res - ref);\n        if (err > maxerr) {\n            maxerr = err;\n            maxerrloc = arg;\n        }\n        arg = nextafterf (arg, -INFINITY);\n    }\n    printf ("maximum absolute error = %15.8e @ %15.8e\\n", maxerr, maxerrloc);\n    return EXIT_SUCCESS;\n}\n
Run Code Online (Sandbox Code Playgroud)\n

如果我们可以保证近似可以产生此范围之外的值,则无需将近似的输出限制在区间 [-1, 1] 内。可以对单精度实现进行详尽的测试,因此可以表明,通过稍微调整近似的系数,可以成功地执行这一点。通过将参数限制为特定的单精度数(近似值返回值 \xc2\xb11),可以实现正确的渐近行为。这要求所有基本算术运算(特别是除法)都符合 IEEE-754,从而正确舍入,所有操作数都是 IEEE-754binary32操作数,并且舍入到最接近或偶数有效。使用提问者允许的最多 10 次操作,可以实现小于 2.0e-5 的最大绝对和相对误差:

\n
#include <stdio.h>\n#include <stdlib.h>\n#include <math.h>\n\n/* Fast computation of hyperbolic tangent. Rational approximation with clamping\n   of the argument. Maximum absolute error = 1.98537030e-5, maximum relative\n   error = 1.98540995e-5, maximum ulp error = 333.089863.\n*/\nfloat fast_tanhf_rat3 (float x) // 10 operations\n{\n    const float cutoff = 5.76110792f; //  0x1.70b5fep+2\n    const float n0 = -1.60153955e-4f; // -0x1.4fde00p-13\n    const float n1 = -9.34448242e-1f; // -0x1.de7000p-1\n    const float n2 = -2.19176636e+1f; // -0x1.5eaec0p+4\n    const float d0 =  2.90915985e+1f; //  0x1.d17730p+4\n    const float d1 =  6.57667847e+1f; //  0x1.071130p+6\n    float y = fminf (fmaxf (x, -cutoff), cutoff);\n    float y2 = y * y;\n    float num = fmaf (fmaf (n0, y2, n1), y2, n2) * y2;\n    float den = fmaf (y2 + d0, y2, d1);\n    float quot = num / den;\n    float res = fmaf (quot, y, y);\n    return res;\n}\n\nint main (void)\n{\n    double ref, abserr, relerr, maxabserr = 0, maxrelerr = 0;\n    float arg, res, maxabserrloc = INFINITY, maxrelerrloc = INFINITY;\n\n    maxabserr = 0;\n    maxrelerr = 0;\n    arg = 0.0f;\n    while (arg < INFINITY) {\n        res = fast_tanhf_rat3 (arg);\n        if (res > 1) { \n            printf ("error at %15.8e: result out of bounds\\n", arg);\n            return EXIT_FAILURE;\n        }\n        ref = tanh ((double)arg);\n        abserr = fabs ((double)res - ref);\n        if (abserr > maxabserr) {\n            maxabserr = abserr;\n            maxabserrloc = arg;\n        }\n        relerr = fabs (((double)res - ref) / ref);\n        if (relerr > maxrelerr) {\n            maxrelerr = relerr;\n            maxrelerrloc = arg;\n        }\n        arg = nextafterf (arg, INFINITY);\n    }\n    arg = -0.0f;\n    while (arg > -INFINITY) {\n        res = fast_tanhf_rat3 (arg);\n        if (res < -1) { \n            printf ("error at %15.8e: result out of bounds\\n", arg);\n            return EXIT_FAILURE;\n        }\n        ref = tanh ((double)arg);\n        abserr = fabs ((double)res - ref);\n        if (abserr > maxabserr) {\n            maxabserr = abserr;\n            maxabserrloc = arg;\n        }\n        relerr = fabs (((double)res - ref) / ref);\n        if (relerr > maxrelerr) {\n            maxrelerr = relerr;\n            maxrelerrloc = arg;\n        }\n        arg = nextafterf (arg, -INFINITY);\n    }\n    printf ("maximum absolute error = %15.8e @ %15.8e\\n", maxabserr, maxabserrloc);\n    printf ("maximum relative error = %15.8e @ %15.8e\\n", maxrelerr, maxrelerrloc);\n    return EXIT_SUCCESS;\n}\n
Run Code Online (Sandbox Code Playgroud)\n


Dav*_*tat 8

Nic Sc​​hraudolph是描述本答案之前版本使用的指数近似的论文的作者,他提出了以下建议。其误差为0.5%。

Java 实现(用于便携式位修改):

public class Tanh {
  private static final float m = (float)((1 << 23) / Math.log(2));
  private static final int b = Float.floatToRawIntBits(1);

  private static float tanh(float x) {
    int y = (int)(m * x);
    float exp_x = Float.intBitsToFloat(b + y);
    float exp_minus_x = Float.intBitsToFloat(b - y);
    return (exp_x - exp_minus_x) / (exp_x + exp_minus_x);
  }

  public static void main(String[] args) {
    double error = 0;
    int end = Float.floatToRawIntBits(10);
    for (int i = 0; i <= end; i++) {
      float x = Float.intBitsToFloat(i);
      error = Math.max(error, Math.abs(tanh(x) - Math.tanh(x)));
    }
    System.out.println(error);
  }
}
Run Code Online (Sandbox Code Playgroud)