在两个大整数的乘法期间捕获并计算溢出

Question

在两个大整数的乘法期间捕获并计算溢出

Ben*_*Ben 60 c integer bit-manipulation overflow multiplication

我正在寻找一种有效(可选的标准,优雅且易于实现)的解决方案来乘以相对较大的数字,并将结果存储为一个或多个整数:

假设我有两个64位整数,如下所示:

uint64_t a = xxx, b = yyy;

Run Code Online (Sandbox Code Playgroud)

当我这样做时a * b,如何检测操作是否导致溢出,并且在这种情况下将进位存储在某处？

请注意,我不想使用任何大号库,因为我对存储数字的方式有限制.

Answer 1

mer*_*ike 73

1.检测溢出:

x = a * b;
if (a != 0 && x / a != b) {
    // overflow handling
}

Run Code Online (Sandbox Code Playgroud)

编辑:修正分区0(谢谢马克!)

2.计算进位非常复杂.一种方法是将两个操作数分成半字,然后对半字应用长乘法:

uint64_t hi(uint64_t x) {
    return x >> 32;
}

uint64_t lo(uint64_t x) {
    return ((1L << 32) - 1) & x;
}

void multiply(uint64_t a, uint64_t b) {
    // actually uint32_t would do, but the casting is annoying
    uint64_t s0, s1, s2, s3; 

    uint64_t x = lo(a) * lo(b);
    s0 = lo(x);

    x = hi(a) * lo(b) + hi(x);
    s1 = lo(x);
    s2 = hi(x);

    x = s1 + lo(a) * hi(b);
    s1 = lo(x);

    x = s2 + hi(a) * hi(b) + hi(x);
    s2 = lo(x);
    s3 = hi(x);

    uint64_t result = s1 << 32 | s0;
    uint64_t carry = s3 << 32 | s2;
}

Run Code Online (Sandbox Code Playgroud)

要看到没有任何部分和本身可以溢出,我们考虑最坏的情况:

        x = s2 + hi(a) * hi(b) + hi(x)

Run Code Online (Sandbox Code Playgroud)

我们B = 1 << 32.然后我们有

            x <= (B - 1) + (B - 1)(B - 1) + (B - 1)
              <= B*B - 1
               < B*B

Run Code Online (Sandbox Code Playgroud)

我相信这会起作用 - 至少它会处理Sjlver的测试用例.除此之外,它是未经测试的(甚至可能不编译,因为我不再有手头的C++编译器).

这个答案的问题是有符号整数溢出导致C中的未定义行为 - 您不能依赖于您的代码在这样的事件之后继续正确执行(或根本不执行!).(在某些平台上,溢出会以类似于零除的方式触发程序中止).sergdev的答案是正确的. (8认同)
caf的评论不正确.C99标准规定"涉及无符号操作数的计算永远不会溢出,因为无法用结果无符号整数类型表示的结果以比模式结果类型可以表示的最大值大1的数量减少. "因此,梅里顿的解决方案在理论和实践中都是有效的. (7认同)
你确定这是对的吗？考虑a = 7和b = 613612691.此计算溢出(对于32位),但进位(根据您)为零.很抱歉在三年多之后回到这里......但是如果StackOverflow接受了错误的答案,那就太遗憾了. (4认同)
固定。（我编写了很多Java代码，其中>>是带符号扩展名的右移位运算符，>>>不带符号扩展名的右移位运算符。在C语言中，只有>>，并且符号扩展可能取决于符号性整数输入）。 (2认同)
假设`a * b`溢出。那么，“a * b ≤ ab - 2^64”，因此“a * b / a ≤ ab/a - (2^64/a) < b”。 (2认同)

Answer 2

ser*_*gtk 32

这个想法是使用以下事实,这对于积分操作是正确的:

a*b > c 当且仅当 a > c/b

/ 是这里不可分割的一部

检查正数溢出的伪代码如下:

if(a> max_int64/b)然后"溢出"否则"ok".

要处理零和负数,您应该添加更多检查.

对于非负的C代码a和b如下:

if (b > 0 && a > 18446744073709551615 / b) {
     // overflow handling
}; else {
    c = a * b;
}

Run Code Online (Sandbox Code Playgroud)

注意:

18446744073709551615 == (1<<64)-1

Run Code Online (Sandbox Code Playgroud)

为了计算进位,我们可以使用方法将数字分成两个32位数,然后将它们相乘,就像我们在纸上做的那样.我们需要拆分数字以避免溢出.

代码如下:

// split input numbers into 32-bit digits
uint64_t a0 = a & ((1LL<<32)-1);
uint64_t a1 = a >> 32;
uint64_t b0 = b & ((1LL<<32)-1);
uint64_t b1 = b >> 32;


// The following 3 lines of code is to calculate the carry of d1
// (d1 - 32-bit second digit of result, and it can be calculated as d1=d11+d12),
// but to avoid overflow.
// Actually rewriting the following 2 lines:
// uint64_t d1 = (a0 * b0 >> 32) + a1 * b0 + a0 * b1;
// uint64_t c1 = d1 >> 32;
uint64_t d11 = a1 * b0 + (a0 * b0 >> 32); 
uint64_t d12 = a0 * b1;
uint64_t c1 = (d11 > 18446744073709551615 - d12) ? 1 : 0;

uint64_t d2 = a1 * b1 + c1;
uint64_t carry = d2; // needed carry stored here

Run Code Online (Sandbox Code Playgroud)

Answer 3

Cha*_*acy 26

虽然这个问题还有其他几个答案,但我有几个代码完全没有经过测试,到目前为止还没有人能够充分比较不同的可能选项.

出于这个原因,我写了并测试多种可能的实现(最后一个是基于该代码来自OpenBSD,Reddit上讨论这里).这是代码:

/* Multiply with overflow checking, emulating clang's builtin function
 *
 *     __builtin_umull_overflow
 *
 * This code benchmarks five possible schemes for doing so.
 */

#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <limits.h>

#ifndef BOOL
    #define BOOL int
#endif

// Option 1, check for overflow a wider type
//    - Often fastest and the least code, especially on modern compilers
//    - When long is a 64-bit int, requires compiler support for 128-bits
//      ints (requires GCC >= 3.0 or Clang)

#if LONG_BIT > 32
    typedef __uint128_t long_overflow_t ;
#else
    typedef uint64_t long_overflow_t;
#endif

BOOL 
umull_overflow1(unsigned long lhs, unsigned long rhs, unsigned long* result)
{
        long_overflow_t prod = (long_overflow_t)lhs * (long_overflow_t)rhs;
        *result = (unsigned long) prod;
        return (prod >> LONG_BIT) != 0;
}

// Option 2, perform long multiplication using a smaller type
//    - Sometimes the fastest (e.g., when mulitply on longs is a library
//      call).
//    - Performs at most three multiplies, and sometimes only performs one.
//    - Highly portable code; works no matter how many bits unsigned long is

BOOL 
umull_overflow2(unsigned long lhs, unsigned long rhs, unsigned long* result)
{
        const unsigned long HALFSIZE_MAX = (1ul << LONG_BIT/2) - 1ul;
        unsigned long lhs_high = lhs >> LONG_BIT/2;
        unsigned long lhs_low  = lhs & HALFSIZE_MAX;
        unsigned long rhs_high = rhs >> LONG_BIT/2;
        unsigned long rhs_low  = rhs & HALFSIZE_MAX;

        unsigned long bot_bits = lhs_low * rhs_low;
        if (!(lhs_high || rhs_high)) {
            *result = bot_bits;
            return 0; 
        }
        BOOL overflowed = lhs_high && rhs_high;
        unsigned long mid_bits1 = lhs_low * rhs_high;
        unsigned long mid_bits2 = lhs_high * rhs_low;

        *result = bot_bits + ((mid_bits1+mid_bits2) << LONG_BIT/2);
        return overflowed || *result < bot_bits
            || (mid_bits1 >> LONG_BIT/2) != 0
            || (mid_bits2 >> LONG_BIT/2) != 0;
}

// Option 3, perform long multiplication using a smaller type (this code is
// very similar to option 2, but calculates overflow using a different but
// equivalent method).
//    - Sometimes the fastest (e.g., when mulitply on longs is a library
//      call; clang likes this code).
//    - Performs at most three multiplies, and sometimes only performs one.
//    - Highly portable code; works no matter how many bits unsigned long is

BOOL 
umull_overflow3(unsigned long lhs, unsigned long rhs, unsigned long* result)
{
        const unsigned long HALFSIZE_MAX = (1ul << LONG_BIT/2) - 1ul;
        unsigned long lhs_high = lhs >> LONG_BIT/2;
        unsigned long lhs_low  = lhs & HALFSIZE_MAX;
        unsigned long rhs_high = rhs >> LONG_BIT/2;
        unsigned long rhs_low  = rhs & HALFSIZE_MAX;

        unsigned long lowbits = lhs_low * rhs_low;
        if (!(lhs_high || rhs_high)) {
            *result = lowbits;
            return 0; 
        }
        BOOL overflowed = lhs_high && rhs_high;
        unsigned long midbits1 = lhs_low * rhs_high;
        unsigned long midbits2 = lhs_high * rhs_low;
        unsigned long midbits  = midbits1 + midbits2;
        overflowed = overflowed || midbits < midbits1 || midbits > HALFSIZE_MAX;
        unsigned long product = lowbits + (midbits << LONG_BIT/2);
        overflowed = overflowed || product < lowbits;

        *result = product;
        return overflowed;
}

// Option 4, checks for overflow using division
//    - Checks for overflow using division
//    - Division is slow, especially if it is a library call

BOOL
umull_overflow4(unsigned long lhs, unsigned long rhs, unsigned long* result)
{
        *result = lhs * rhs;
        return rhs > 0 && (SIZE_MAX / rhs) < lhs;
}

// Option 5, checks for overflow using division
//    - Checks for overflow using division
//    - Avoids division when the numbers are "small enough" to trivially
//      rule out overflow
//    - Division is slow, especially if it is a library call

BOOL
umull_overflow5(unsigned long lhs, unsigned long rhs, unsigned long* result)
{
        const unsigned long MUL_NO_OVERFLOW = (1ul << LONG_BIT/2) - 1ul;
        *result = lhs * rhs;
        return (lhs >= MUL_NO_OVERFLOW || rhs >= MUL_NO_OVERFLOW) &&
            rhs > 0 && SIZE_MAX / rhs < lhs;
}

#ifndef umull_overflow
    #define umull_overflow2
#endif

/*
 * This benchmark code performs a multiply at all bit sizes, 
 * essentially assuming that sizes are logarithmically distributed.
 */

int main()
{
        unsigned long i, j, k;
        int count = 0;
        unsigned long mult;
        unsigned long total = 0;

        for (k = 0; k < 0x40000000 / LONG_BIT / LONG_BIT; ++k)
                for (i = 0; i != LONG_MAX; i = i*2+1)
                        for (j = 0; j != LONG_MAX; j = j*2+1) {
                                count += umull_overflow(i+k, j+k, &mult);
                                total += mult;
                        }
        printf("%d overflows (total %lu)\n", count, total);
}

Run Code Online (Sandbox Code Playgroud)

以下是结果,使用我所拥有的各种编译器和系统进行测试(在这种情况下,所有测试都是在OS X上完成的,但结果在BSD或Linux系统上应该类似):

+------------------+----------+----------+----------+----------+----------+
|                  | Option 1 | Option 2 | Option 3 | Option 4 | Option 5 |
|                  |  BigInt  | LngMult1 | LngMult2 |   Div    |  OptDiv  |
+------------------+----------+----------+----------+----------+----------+
| Clang 3.5 i386   |    1.610 |    3.217 |    3.129 |    4.405 |    4.398 |
| GCC 4.9.0 i386   |    1.488 |    3.469 |    5.853 |    4.704 |    4.712 |
| GCC 4.2.1 i386   |    2.842 |    4.022 |    3.629 |    4.160 |    4.696 |
| GCC 4.2.1 PPC32  |    8.227 |    7.756 |    7.242 |   20.632 |   20.481 |
| GCC 3.3   PPC32  |    5.684 |    9.804 |   11.525 |   21.734 |   22.517 |
+------------------+----------+----------+----------+----------+----------+
| Clang 3.5 x86_64 |    1.584 |    2.472 |    2.449 |    9.246 |    7.280 |
| GCC 4.9 x86_64   |    1.414 |    2.623 |    4.327 |    9.047 |    7.538 |
| GCC 4.2.1 x86_64 |    2.143 |    2.618 |    2.750 |    9.510 |    7.389 |
| GCC 4.2.1 PPC64  |   13.178 |    8.994 |    8.567 |   37.504 |   29.851 |
+------------------+----------+----------+----------+----------+----------+

Run Code Online (Sandbox Code Playgroud)

基于这些结果,我们可以得出一些结论:

显然,基于分工的方法虽然简单易行,但速度很慢.
在所有情况下,没有技术是明显的赢家.
在现代编译器中,如果可以使用,那么使用大型int方法是最好的
在较早的编译器中,长乘法是最好的
令人惊讶的是,GCC 4.9.0的性能回归高于GCC 4.2.1,而GCC 4.2.1的性能回归高于GCC 3.3.

Answer 4

Mar*_*ers 9

在== 0时也适用的版本:

    x = a * b;
    if (a != 0 && x / a != b) {
        // overflow handling
    }

Run Code Online (Sandbox Code Playgroud)

@kevinf在C标准中，带符号的整数溢出是未定义的行为。这意味着编译器可以断言任何“明显的”溢出行为都是“不可能的”并且“不在乎其后果”，因此可以进行优化。例如，由于之前的赋值语句，编译器可以将x / a！= b扩展为（（a * b）/ a！= b），然后减小为b！= b，最后为false `。这就是由于编译器优化而导致溢出检查无法工作的方式。... (3认同)

Answer 5

Nor*_*sey 8

如果您不仅需要检测溢出而且还需要捕获进位,那么最好将数字分解为32位部分.代码是一场噩梦; 以下只是草图:

#include <stdint.h>

uint64_t mul(uint64_t a, uint64_t b) {
  uint32_t ah = a >> 32;
  uint32_t al = a;  // truncates: now a = al + 2**32 * ah
  uint32_t bh = b >> 32;
  uint32_t bl = b;  // truncates: now b = bl + 2**32 * bh
  // a * b = 2**64 * ah * bh + 2**32 * (ah * bl + bh * al) + al * bl
  uint64_t partial = (uint64_t) al * (uint64_t) bl;
  uint64_t mid1    = (uint64_t) ah * (uint64_t) bl;
  uint64_t mid2    = (uint64_t) al * (uint64_t) bh;
  uint64_t carry   = (uint64_t) ah * (uint64_t) bh;
  // add high parts of mid1 and mid2 to carry
  // add low parts of mid1 and mid2 to partial, carrying
  //    any carry bits into carry...
}

Run Code Online (Sandbox Code Playgroud)

问题不仅仅是部分产品,而是任何总和可能溢出的事实.

如果我必须真正做到这一点,我会用本地汇编语言编写扩展乘法例程. 也就是说,例如,将两个64位整数相乘以获得128位结果,该结果存储在两个64位寄存器中.所有合理的硬件都在单个本机乘法指令中提供此功能 - 它不仅可以从C访问.

这是一种极少数情况,其中最优雅且易于编程的解决方案实际上是使用汇编语言.但它肯定不便携:-(

Answer 6

小智 5

使用 clang 和 gcc 简单快速：

unsigned long long t a, b, result;
if (__builtin_umulll_overflow(a, b, &result)) {
    // overflow!!
}

Run Code Online (Sandbox Code Playgroud)

这将在可用的情况下使用硬件支持溢出检测。作为编译器扩展，它甚至可以处理有符号整数溢出（用 smul 替换 umul），即使这是 C++ 中未定义的行为。

归档时间：	15 年，11 月前
查看次数：	50064 次
最近记录：	5 年，11 月前