我想计算一个文件的SHA256值,文件大小超过1M。为了使用 mbedtls 库获取这个哈希值,我需要将整个文件复制到内存中。但是我的内存大小只有 100K。所以我想知道是否有一些方法可以计算部分中的文件哈希值。
为了使用 mbedtls 库获取这个哈希值,我需要将整个文件复制到内存中。
这是不准确的。mbedtls 库支持哈希值的增量计算。
要使用 mbedtls 计算 SHA-256 哈希,您必须执行以下步骤(参考):
mbedtls_sha256_context结构体的一个实例 。mbedtls_sha256_init和 然后初始化上下文mbedtls_sha256_starts_ret。mbedtls_sha256_update_ret。mbedtls_sha256_finish_ret。mbedtls_sha256_free请注意,这并不意味着该mbedtls_sha256_context结构在mbedtls_sha256_finish_ret被调用之前会保存整个数据。相反,mbedtls_sha256_context只保存哈希计算的中间结果。当使用 将附加数据输入散列函数时mbedtls_sha256_update_ret,计算状态会更新,新的中间结果存储在 中 mbedtls_sha256_context。
mbedtls_sha256_context由 确定的 a 的总大小sizeof( mbedtls_sha256_context)在我的系统上为 108 字节。我们还可以从 mbedtls 源代码(参考)中看到这一点:
typedef struct mbedtls_sha256_context
{
uint32_t total[2]; /*!< The number of Bytes processed. */
uint32_t state[8]; /*!< The intermediate digest state. */
unsigned char buffer[64]; /*!< The data block being processed. */
int is224; /*!< Determines which function to use:
0: Use SHA-256, or 1: Use SHA-224. */
}
mbedtls_sha256_context;
Run Code Online (Sandbox Code Playgroud)
我们可以看到该结构保存了一个大小计数器,2*32 bit = 8 byte用于跟踪到目前为止处理的总字节数。8*32 bit = 32 byte用于跟踪哈希计算的中间结果。64 byte用于跟踪当前正在处理的数据块。如您所见,这是一个固定大小的缓冲区,不会随着散列的数据量而增长。最后一个 int 用于区分 SHA-224 和 SHA-256。在我的系统上sizeof(int) == 4。所以总的来说,我们得到了8+32+64+4 = 108 byte.
Consider the following example program, which reads a file step by step into a buffer of size 4096 and feeds the buffer into the hash function in each step:
typedef struct mbedtls_sha256_context
{
uint32_t total[2]; /*!< The number of Bytes processed. */
uint32_t state[8]; /*!< The intermediate digest state. */
unsigned char buffer[64]; /*!< The data block being processed. */
int is224; /*!< Determines which function to use:
0: Use SHA-256, or 1: Use SHA-224. */
}
mbedtls_sha256_context;
Run Code Online (Sandbox Code Playgroud)
When running a program on a large sample file, the following behavior can be observed:
#include <mbedtls/sha256.h>
#include <stdio.h>
#include <stdlib.h>
#define BUFFER_SIZE 4096
#define HASH_SIZE 32
int main(void) {
int ret;
// Initialize hash
mbedtls_sha256_context ctx;
mbedtls_sha256_init(&ctx);
mbedtls_sha256_starts_ret(&ctx, /*is224=*/0);
// Open file
FILE *fp = fopen("large_file", "r");
if (fp == NULL) {
ret = EXIT_FAILURE;
goto exit;
}
// Read file in chunks of size BUFFER_SIZE
uint8_t buffer[BUFFER_SIZE];
size_t read;
while ((read = fread(buffer, 1, BUFFER_SIZE, fp)) > 0) {
mbedtls_sha256_update_ret(&ctx, buffer, read);
}
// Calculate final hash sum
uint8_t hash[HASH_SIZE];
mbedtls_sha256_finish_ret(&ctx, hash);
// Simple debug printing. Use MBEDTLS_SSL_DEBUG_BUF in a real program.
for (size_t i = 0; i < HASH_SIZE; i++) {
printf("%02x", hash[i]);
}
printf("\n");
// Cleanup
fclose(fp);
ret = EXIT_SUCCESS;
exit:
mbedtls_sha256_free(&ctx);
return ret;
}
Run Code Online (Sandbox Code Playgroud)
We can see that the program calculates the correct SHA-256 hash. We can also inspect the memory used by the program:
$ dd if=/dev/random of=large_file bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB, 977 MiB) copied, 5.78353 s, 177 MB/s
$ sha256sum large_file
ae2d3b46eec018e006533da47a80e933a741a8b1320cfce7392a5472faae0216 large_file
$ gcc -O3 -static test.c /usr/lib/libmbedcrypto.a
$ ./a.out
ae2d3b46eec018e006533da47a80e933a741a8b1320cfce7392a5472faae0216
Run Code Online (Sandbox Code Playgroud)
We can see that the program consumed at most 824 KB of memory. Thus, we have calculated the hash of a 1 GB file with < 1MB of memory. This shows that we do not have to load the entire file into memory at once to calculate its hash with mbedtls.
Keep in mind this measurement was done on a 64 bit desktop computer, not an embedded platform. Also, no further optimizations were performed besides -O3 and static linking (the latter approximately halved the memory usage of the program). I would expect the memory footprint to be even smaller on an embedded device with a smaller address size and a tool chain performing further optimizations.
| 归档时间: |
|
| 查看次数: |
326 次 |
| 最近记录: |