如何使用perl和postgresql遍历大型结果集

Tob*_*ker 3 postgresql perl dbi large-data database-cursor

DBD::PgPerl 的PostgreSQL绑定将始终获取查询的整个结果集.因此,如果您使用简单的prepare执行来遍历大型表,那么只需运行就可以将整个表放在内存中$sth->execute().准备好的陈述和电话fetch_row都没有帮助.

如果您正在使用BIG表,以下将失败.

use DBI;
my $dbh =   DBI->connect("dbi:Pg:dbname=big_db","user","password",{
        AutoCommit => 0,
        ReadOnly => 1,
        PrintError => 1,
        RaiseError =>  1,
});

my $sth = $dbh->prepare('SELECT * FROM big_table');
$sth->execute(); #prepare to run out of memory here
while (my $row = $sth->fetchrow_hashref('NAME_lc')){
  # do something with the $row hash
}
$dbh->disconnect();
Run Code Online (Sandbox Code Playgroud)

Tob*_*ker 6

要解决此问题,请声明游标.然后使用游标获取数据块.ReadOnly和AutoCommit设置对于此工作非常重要.由于PostgreSQL只会进行CURSORS阅读.

use DBI;
my $dbh =   DBI->connect("dbi:Pg:dbname=big_db","user","password",{
        AutoCommit => 0,
        ReadOnly => 1,
        PrintError => 1,
        RaiseError =>  1,
});

$dbh->do(<<'SQL');
DECLARE mycursor CURSOR FOR
SELECT * FROM big_table
SQL

my $sth = $dbh->prepare("FETCH 1000 FROM mycursor");
while (1) {
  warn "* fetching 1000 rows\n";
  $sth->execute();
  last if $sth->rows == 0;
  while (my $row = $sth->fetchrow_hashref('NAME_lc')){
    # do something with the $row hash
  }
}
$dbh->disconnect();
Run Code Online (Sandbox Code Playgroud)

  • 应该不需要继续在循环内重新创建语句句柄.进行这种更改有助于进行其他优化,例如将fetchrow_hashref替换为循环外的一个[bind_colums](https://metacpan.org/pod/DBI#bind_columns)调用,然后在内部进行简单的提取,这将更快. (4认同)