具有4gb以上元素的Java数组

Omr*_*dan 19 java arrays 64-bit

我有一个大文件,预计大约12 GB.我想将它全部加载到具有16 GB RAM的强大64位机器的内存中,但我认为Java不支持大字节数组:

File f = new File(file);
long size = f.length();
byte data[] = new byte[size]; // <- does not compile, not even on 64bit JVM
Run Code Online (Sandbox Code Playgroud)

是否可以使用Java?

Eclipse编译器的编译错误是:

Type mismatch: cannot convert from long to int
Run Code Online (Sandbox Code Playgroud)

javac给出:

possible loss of precision
found   : long
required: int
         byte data[] = new byte[size];
Run Code Online (Sandbox Code Playgroud)

Bil*_*ard 20

Java数组索引的类型int(4字节或32位),所以我担心你的数组中只有2 31 - 1或2147483647个插槽.我将数据读入另一个数据结构,如2D数组.


小智 13

package com.deans.rtl.util;

import java.io.FileInputStream;
import java.io.IOException;

/**
 * 
 * @author william.deans@gmail.com
 *
 * Written to work with byte arrays requiring address space larger than 32 bits. 
 * 
 */

public class ByteArray64 {

    private final long CHUNK_SIZE = 1024*1024*1024; //1GiB

    long size;
    byte [][] data;

    public ByteArray64( long size ) {
        this.size = size;
        if( size == 0 ) {
            data = null;
        } else {
            int chunks = (int)(size/CHUNK_SIZE);
            int remainder = (int)(size - ((long)chunks)*CHUNK_SIZE);
            data = new byte[chunks+(remainder==0?0:1)][];
            for( int idx=chunks; --idx>=0; ) {
                data[idx] = new byte[(int)CHUNK_SIZE];
            }
            if( remainder != 0 ) {
                data[chunks] = new byte[remainder];
            }
        }
    }
    public byte get( long index ) {
        if( index<0 || index>=size ) {
            throw new IndexOutOfBoundsException("Error attempting to access data element "+index+".  Array is "+size+" elements long.");
        }
        int chunk = (int)(index/CHUNK_SIZE);
        int offset = (int)(index - (((long)chunk)*CHUNK_SIZE));
        return data[chunk][offset];
    }
    public void set( long index, byte b ) {
        if( index<0 || index>=size ) {
            throw new IndexOutOfBoundsException("Error attempting to access data element "+index+".  Array is "+size+" elements long.");
        }
        int chunk = (int)(index/CHUNK_SIZE);
        int offset = (int)(index - (((long)chunk)*CHUNK_SIZE));
        data[chunk][offset] = b;
    }
    /**
     * Simulates a single read which fills the entire array via several smaller reads.
     * 
     * @param fileInputStream
     * @throws IOException
     */
    public void read( FileInputStream fileInputStream ) throws IOException {
        if( size == 0 ) {
            return;
        }
        for( int idx=0; idx<data.length; idx++ ) {
            if( fileInputStream.read( data[idx] ) != data[idx].length ) {
                throw new IOException("short read");
            }
        }
    }
    public long size() {
        return size;
    }
}
}
Run Code Online (Sandbox Code Playgroud)


Yes*_*ke. 6

如果有必要,您可以将数据加载到一个数组数组中,这将为您提供最大的int.maxValue 平方字节,甚至比最强大的机器在内存中保存得更好.