如何将1亿行加载到内存中

rav*_*rab 4 java sql jdbc out-of-memory

我需要从MySQL数据库加载1亿多行到内存中.我的java程序失败,java.lang.OutOfMemoryError: Java heap space 我的机器中有8GB RAM,而且我的JVM选项中给了-Xmx6144m.

这是我的代码

public List<Record> loadTrainingDataSet() {

    ArrayList<Record> records = new ArrayList<Record>();
    try {
        Statement s = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
        s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings");
        ResultSet rs = s.getResultSet();
        int count = 0;
        while (rs.next()) {
Run Code Online (Sandbox Code Playgroud)

知道如何克服这个问题吗?


UPDATE

我发现了这篇文章,并根据下面的评论更新了我的代码.我似乎能够以相同的-Xmx6144m数量将数据加载到内存中,但这需要很长时间.

这是我的代码.

...
import org.apache.mahout.math.SparseMatrix;
...

@Override
public SparseMatrix loadTrainingDataSet() {
    long t1 = System.currentTimeMillis();
    SparseMatrix ratings = new SparseMatrix(NUM_ROWS,NUM_COLS);
    int REC_START = 0;
    int REC_END = 0;

    try {
        for (int i = 1; i <= 101; i++) {
            long t11 = System.currentTimeMillis();
            REC_END = 1000000 * i;
            Statement s = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
                    java.sql.ResultSet.CONCUR_READ_ONLY);
            s.setFetchSize(Integer.MIN_VALUE);
            ResultSet rs = s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings LIMIT " + REC_START + "," + REC_END);//100480507
            while (rs.next()) {
                int movieId = rs.getInt("movie_id");
                int customerId = rs.getInt("customer_id");
                byte rating = (byte) rs.getInt("rating");
                ratings.set(customerId,movieId,rating);
            }
            long t22 = System.currentTimeMillis();
            System.out.println("Round " + i + " completed " + (t22 - t11) / 1000 + " seconds");
            rs.close();
            s.close();
        }

    } catch (Exception e) {
        System.err.println("Cannot connect to database server " + e);
    } finally {
        if (conn != null) {
            try {
                conn.close();
                System.out.println("Database connection terminated");
            } catch (Exception e) { /* ignore close errors */ }
        }
    }
    long t2 = System.currentTimeMillis();
    System.out.println(" Took " + (t2 - t1) / 1000 + " seconds");
    return ratings;
}
Run Code Online (Sandbox Code Playgroud)

要加载前100,000行,需要2秒钟.要加载29个100,000行,需要46秒.我在中间停止了这个过程,因为它耗费了太多时间.这些可接受的时间是多少?有没有办法提高这段代码的性能?我在8GB RAM 64位Windows机器上运行它.

Mar*_*nik 11

一亿条记录意味着每条记录最多可占用50个字节,以便适合6 GB +一些额外空间用于其他分配.在Java中,50字节不算什么; Object[]每个元素只需要32个字节.您必须找到一种方法来立即在while (rs.next())循环中使用结果,而不是完全保留它们.

  • 默认情况下,MySQL JDBC驱动程序将所有行加载到ResultSet中.你需要指定`setFetchSize(Integer.MIN_VALUE)`来实际让它逐行获取. (4认同)
  • @AlanB MySQL忽略`setFetchSize`的所有值,除了`Integer.MIN_VALUE`,参见[实施说明](http://dev.mysql.com/doc/refman/5.5/en/connector-j-reference-implementation- notes.html),在`ResultSet`下 (2认同)