使用 mbedlts 在内存受限的系统上使用 SHA 散列文件

Question

使用 mbedlts 在内存受限的系统上使用 SHA 散列文件

Jad*_*ade 2 c sha mbedtls

我想计算一个文件的SHA256值，文件大小超过1M。为了使用 mbedtls 库获取这个哈希值，我需要将整个文件复制到内存中。但是我的内存大小只有 100K。所以我想知道是否有一些方法可以计算部分中的文件哈希值。

Answer 1

f9c*_*534 6

为了使用 mbedtls 库获取这个哈希值，我需要将整个文件复制到内存中。

这是不准确的。mbedtls 库支持哈希值的增量计算。

要使用 mbedtls 计算 SHA-256 哈希，您必须执行以下步骤（参考）：

创建mbedtls_sha256_context结构体的一个实例。
使用mbedtls_sha256_init和然后初始化上下文mbedtls_sha256_starts_ret。
使用将数据输入到哈希函数中mbedtls_sha256_update_ret。
用计算最终的哈希和mbedtls_sha256_finish_ret。
释放上下文 mbedtls_sha256_free

请注意，这并不意味着该mbedtls_sha256_context结构在mbedtls_sha256_finish_ret被调用之前会保存整个数据。相反，mbedtls_sha256_context只保存哈希计算的中间结果。当使用将附加数据输入散列函数时mbedtls_sha256_update_ret，计算状态会更新，新的中间结果存储在中 mbedtls_sha256_context。

mbedtls_sha256_context由确定的 a 的总大小sizeof( mbedtls_sha256_context)在我的系统上为 108 字节。我们还可以从 mbedtls 源代码（参考）中看到这一点：

typedef struct mbedtls_sha256_context
{
    uint32_t total[2];          /*!< The number of Bytes processed.  */
    uint32_t state[8];          /*!< The intermediate digest state.  */
    unsigned char buffer[64];   /*!< The data block being processed. */
    int is224;                  /*!< Determines which function to use:
                                     0: Use SHA-256, or 1: Use SHA-224. */
}
mbedtls_sha256_context;

Run Code Online (Sandbox Code Playgroud)

我们可以看到该结构保存了一个大小计数器，2*32 bit = 8 byte用于跟踪到目前为止处理的总字节数。8*32 bit = 32 byte用于跟踪哈希计算的中间结果。64 byte用于跟踪当前正在处理的数据块。如您所见，这是一个固定大小的缓冲区，不会随着散列的数据量而增长。最后一个 int 用于区分 SHA-224 和 SHA-256。在我的系统上sizeof(int) == 4。所以总的来说，我们得到了8+32+64+4 = 108 byte.

Consider the following example program, which reads a file step by step into a buffer of size 4096 and feeds the buffer into the hash function in each step:

typedef struct mbedtls_sha256_context
{
    uint32_t total[2];          /*!< The number of Bytes processed.  */
    uint32_t state[8];          /*!< The intermediate digest state.  */
    unsigned char buffer[64];   /*!< The data block being processed. */
    int is224;                  /*!< Determines which function to use:
                                     0: Use SHA-256, or 1: Use SHA-224. */
}
mbedtls_sha256_context;

Run Code Online (Sandbox Code Playgroud)

When running a program on a large sample file, the following behavior can be observed:

#include <mbedtls/sha256.h>

#include <stdio.h>
#include <stdlib.h>

#define BUFFER_SIZE 4096
#define HASH_SIZE 32

int main(void) {
  int ret;

  // Initialize hash
  mbedtls_sha256_context ctx;
  mbedtls_sha256_init(&ctx);
  mbedtls_sha256_starts_ret(&ctx, /*is224=*/0);

  // Open file
  FILE *fp = fopen("large_file", "r");
  if (fp == NULL) {
    ret = EXIT_FAILURE;
    goto exit;
  }

  // Read file in chunks of size BUFFER_SIZE
  uint8_t buffer[BUFFER_SIZE];
  size_t read;
  while ((read = fread(buffer, 1, BUFFER_SIZE, fp)) > 0) {
    mbedtls_sha256_update_ret(&ctx, buffer, read);
  }

  // Calculate final hash sum
  uint8_t hash[HASH_SIZE];
  mbedtls_sha256_finish_ret(&ctx, hash);

  // Simple debug printing. Use MBEDTLS_SSL_DEBUG_BUF in a real program.
  for (size_t i = 0; i < HASH_SIZE; i++) {
    printf("%02x", hash[i]);
  }
  printf("\n");

  // Cleanup
  fclose(fp);
  ret = EXIT_SUCCESS;

exit:
  mbedtls_sha256_free(&ctx);
  return ret;
}

Run Code Online (Sandbox Code Playgroud)

We can see that the program calculates the correct SHA-256 hash. We can also inspect the memory used by the program:

$ dd if=/dev/random of=large_file bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB, 977 MiB) copied, 5.78353 s, 177 MB/s
$ sha256sum large_file 
ae2d3b46eec018e006533da47a80e933a741a8b1320cfce7392a5472faae0216  large_file
$ gcc -O3 -static test.c /usr/lib/libmbedcrypto.a
$ ./a.out 
ae2d3b46eec018e006533da47a80e933a741a8b1320cfce7392a5472faae0216

Run Code Online (Sandbox Code Playgroud)

We can see that the program consumed at most 824 KB of memory. Thus, we have calculated the hash of a 1 GB file with < 1MB of memory. This shows that we do not have to load the entire file into memory at once to calculate its hash with mbedtls.

Keep in mind this measurement was done on a 64 bit desktop computer, not an embedded platform. Also, no further optimizations were performed besides -O3 and static linking (the latter approximately halved the memory usage of the program). I would expect the memory footprint to be even smaller on an embedded device with a smaller address size and a tool chain performing further optimizations.

归档时间：	5 年，3 月前
查看次数：	326 次
最近记录：	5 年，1 月前