rav*_*rab 4 java sql jdbc out-of-memory
我需要从MySQL数据库加载1亿多行到内存中.我的java程序失败,java.lang.OutOfMemoryError: Java heap space
我的机器中有8GB RAM,而且我的JVM选项中给了-Xmx6144m.
这是我的代码
public List<Record> loadTrainingDataSet() {
ArrayList<Record> records = new ArrayList<Record>();
try {
Statement s = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings");
ResultSet rs = s.getResultSet();
int count = 0;
while (rs.next()) {
Run Code Online (Sandbox Code Playgroud)
知道如何克服这个问题吗?
我发现了这篇文章,并根据下面的评论更新了我的代码.我似乎能够以相同的-Xmx6144m数量将数据加载到内存中,但这需要很长时间.
这是我的代码.
...
import org.apache.mahout.math.SparseMatrix;
...
@Override
public SparseMatrix loadTrainingDataSet() {
long t1 = System.currentTimeMillis();
SparseMatrix ratings = new SparseMatrix(NUM_ROWS,NUM_COLS);
int REC_START = 0;
int REC_END = 0;
try {
for (int i = 1; i <= 101; i++) {
long t11 = System.currentTimeMillis();
REC_END = 1000000 * i;
Statement s = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
s.setFetchSize(Integer.MIN_VALUE);
ResultSet rs = s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings LIMIT " + REC_START + "," + REC_END);//100480507
while (rs.next()) {
int movieId = rs.getInt("movie_id");
int customerId = rs.getInt("customer_id");
byte rating = (byte) rs.getInt("rating");
ratings.set(customerId,movieId,rating);
}
long t22 = System.currentTimeMillis();
System.out.println("Round " + i + " completed " + (t22 - t11) / 1000 + " seconds");
rs.close();
s.close();
}
} catch (Exception e) {
System.err.println("Cannot connect to database server " + e);
} finally {
if (conn != null) {
try {
conn.close();
System.out.println("Database connection terminated");
} catch (Exception e) { /* ignore close errors */ }
}
}
long t2 = System.currentTimeMillis();
System.out.println(" Took " + (t2 - t1) / 1000 + " seconds");
return ratings;
}
Run Code Online (Sandbox Code Playgroud)
要加载前100,000行,需要2秒钟.要加载29个100,000行,需要46秒.我在中间停止了这个过程,因为它耗费了太多时间.这些可接受的时间是多少?有没有办法提高这段代码的性能?我在8GB RAM 64位Windows机器上运行它.
Mar*_*nik 11
一亿条记录意味着每条记录最多可占用50个字节,以便适合6 GB +一些额外空间用于其他分配.在Java中,50字节不算什么; Object[]每个元素只需要32个字节.您必须找到一种方法来立即在while (rs.next())循环中使用结果,而不是完全保留它们.