如何编写可靠的独立于内容的 memcmp() 实现?

sha*_*oth 5 c security compiler-optimization memcmp timing-attack

一个简单的实现memcmp()是这样的(来自这个答案):

int memcmp_test(const char *cs_in, const char *ct_in, size_t n)
{
   size_t i;  
   const unsigned char * cs = (const unsigned char*) cs_in;
   const unsigned char * ct = (const unsigned char*) ct_in;

   for (i = 0; i < n; i++, cs++, ct++)
   {
       if (*cs < *ct)
       {
          return -1;
       }
       else if (*cs > *ct)
       {
          return 1;
       }
   }
   return 0;
}
Run Code Online (Sandbox Code Playgroud)

一旦找到第一个不匹配字节,块遍历就会停止。这对于加密应用程序来说可能没有好处,因为它使得执行时间依赖于块内容,并且这可能允许定时攻击。所以 OpenSSL 使用这个(取自这里):

int CRYPTO_memcmp(const void *in_a, const void *in_b, size_t len)
{
    size_t i;
    const unsigned char *a = in_a;
    const unsigned char *b = in_b;
    unsigned char x = 0;

    for (i = 0; i < len; i++)
         x |= a[i] ^ b[i];

    return x;
}
Run Code Online (Sandbox Code Playgroud)

中间没有breaks 或s,因此这段代码必须遍历块的整个长度。return至少这是意图。

现在这是一个用法示例(来自此处):

 static int des_ede3_unwrap(EVP_CIPHER_CTX *ctx,
     unsigned char *out, const unsigned char *in, size_t inl)
 {
      unsigned char icv[8], iv[8], sha1tmp[SHA_DIGEST_LENGTH];
      //whatever, unrelated then...
      if (!CRYPTO_memcmp(sha1tmp, icv, 8))
         rv = inl - 16;
      //whatever, unrelated
 }
Run Code Online (Sandbox Code Playgroud)

现在,通过链接时代码生成 (Visual C++ LTCG) 或链接时优化 (gcc LTO),编译器能够查看CRYPTO_memcmp()实现和调用站点(即使它们位于不同的翻译单元中)。可以看到,调用站点没有使用实际值,只是将其与 null 进行比较。因此,它可以自由地进行转换,CRYPTO_memcmp()一旦发现第一个不匹配的字节对,并且“安全”版本memcmp()不再安全,它就会立即返回。

如何memcmp()实现才能使符合标准的编译器不会将其转换为有助于定时攻击的版本?

sha*_*oth 4

There're two possible solutions.

The first one is absolutely portable and up to the Standard - declare x volatile which basically tells the compiler that it must preserve the sequence of updating x and so it must read both data arrays fully. It doesn't prevent the compiler from making those reads in larger portions (such as read several bytes at a time and then use them in the right sequence), but it's no big deal here - the compiler will have to emit a number of reads proportional to the number of bytes in the data arrays. The problem with this approach is that it can make this code slower - some benchmarks I ran show about 50 percent slowdown on a specific processor and specific toolset. YMMV.

The second possible solution is to cast the pointers into volatile unsigned char* and access through them

const volatile unsigned char * cs = (const volatile unsigned char*) cs_in;
const volatile unsigned char * ct = (const volatile unsigned char*) ct_in;
// the rest of the code is the same
Run Code Online (Sandbox Code Playgroud)

which is just as fast but is not fully Standard compliant (see this). Many compiler treat such cast as a hint that these reads should not be altered but the Standard doesn't really make any guarantees for that and so it is possible that a compiler breaks this code on purpose. Therefore this solution is not portable.