对于大于 4GiB 的文件，fread 有时会返回错误值

Question

对于大于 4GiB 的文件，fread 有时会返回错误值

我正在尝试读取一个由 30e6 个位置组成的大二进制文件，每个位置有 195 个双精度数。由于文件太大而无法全部读入内存，因此我按 10000 个位置分块读取它。然后我用它进行一些计算并读取下一个块......

由于我需要随机访问文件，因此我编写了一个函数来从文件中读取给定块（无符号整数块）并将其存储在 **chunk_data 中。该函数返回读取的位置总数。

unsigned int read_chunk(double **chunk_data, unsigned int chunk) {
    FILE *in_glf_fh;
    unsigned int total_bytes_read = 0;

    // Define chunk start and end positions
    unsigned int start_pos = chunk * 10000;
    unsigned int end_pos = start_pos + 10000 - 1;
    unsigned int chunk_size = end_pos - start_pos + 1;

    // Open input file
    in_glf_fh = fopen(in_glf, "rb");
    if( in_glf_fh == NULL )
        error("ERROR: cannot open file!");

    // Search start position
    if( fseek(in_glf_fh, start_pos * 195 * sizeof(double), SEEK_SET) != 0 )
        error("ERROR: cannot seek file!");

    // Read data from file
    for(unsigned int c = 0; c < chunk_size; c++) {
         unsigned int bytes_read = fread ( (void*) chunk_data[c], sizeof(double), 195, in_glf_fh);
         if( bytes_read != 195 && !feof(in_glf_fh) )
             error("ERROR: cannot read file!");
         total_bytes_read += bytes_read;
    }

    fclose(in_glf_fh);
    return( total_bytes_read/195 );
}

Run Code Online (Sandbox Code Playgroud)

问题是，在读取一些块后，fread()开始给出错误的值！fread()此外，根据块大小，开始表现奇怪的位置也有所不同：

chunk of 1 pos, wrong at chunk 22025475
chunk of 10000 pos, wrong at chunk 2203
chunk of 100000 pos, wrong at chunk 221

Run Code Online (Sandbox Code Playgroud)

有人知道会发生什么吗？

Answer 1

wal*_*lyk 5

确定30e6 positions不是十六进制，而是 30,000,000 后：考虑以下问题fseek()：该文件有 46,800,000,000 字节。普通版fseek()（在 16 位和 32 位平台上）仅限于前 2^32-1 字节 (=4,294,967,295)。

根据程序运行的平台，您可能必须使用lseek64或其等效平台。在Linux上，有

lseek()与使用

#定义_FILE_OFFSET_BITS 64
llseek()
lseek64()

归档时间：	13 年，3 月前
查看次数：	1040 次
最近记录：	13 年，3 月前