Perl用mysql,非常慢,如何加速

Question

Perl用mysql,非常慢,如何加速

unit
id fir_name sec_name
author
id name unit_id
author_paper
id author_id paper_id

Run Code Online (Sandbox Code Playgroud)

我想统一作者['同一作者'意味着名称是相同的,他们的单位'fir_names是相同的',我必须同时更改author_paper表.

这是我做的:

$conn->do('create index author_name on author (name)');
my $sqr = $conn->prepare("select name from author group by name having count(*) > 1");
$sqr->execute();
while(my @row = $sqr->fetchrow_array()) {
  my $dup_name = $row[0];
  $dup_name = formatHtml($dup_name);
    my $sqr2 = $conn->prepare("select id, unit_id from author where name = '$dup_name'");
    $sqr2->execute();

    my %fir_name_hash = ();
    while(my @row2 = $sqr2->fetchrow_array()) {
        my $author_id = $row2[0];
        my $unit_id = $row2[1];
        my $fir_name = getFirNameInUnit($conn, $unit_id);
        if (not exists $fir_name_hash{$fir_name}) {
            $fir_name_hash{$fir_name} = []; #anonymous arr reference
        }
        $x = $fir_name_hash{$fir_name};
        push @$x, $author_id;
    }

    while(my ($fir_name, $author_id_arr) = each(%fir_name_hash)) {
        my $count = scalar @$author_id_arr;
        if ($count == 1) {next;}
        my $author_id = $author_id_arr->[0];
        for ($i = 1; $i < $count; $i++) {
            #print "$author_id_arr->[$i] => $author_id\n";
            unifyAuthorAndAuthorPaperTable($conn, $author_id, $author_id_arr->[$i]); #just delete in author table, and update in author_paper table 
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

从作者中选择计数(*); #240,000从作者中选择计数(不同(名称)); #7,7000它非常慢!!我已经运行了5个小时,它只删除了大约4,0000个重复名称.如何让它运行得更快.我渴望得到你的建议

Answer 1

dgw*_*dgw 8

您不应该在循环中准备第二个sql语句,并且在使用?占位符时可以实际使用该准备:

$conn->do('create index author_name on author (name)');

my $sqr = $conn->prepare('select name from author group by name having count(*) > 1');

# ? is the placeholder and the database driver knows if its an integer or a string and 
# quotes the input if needed.
my $sqr2 = $conn->prepare('select id, unit_id from author where name = ?');

$sqr->execute();
while(my @row = $sqr->fetchrow_array()) {
  my $dup_name = $row[0];
  $dup_name = formatHtml($dup_name);

    # Now you can reuse the prepared handle with different input
    $sqr2->execute( $dup_name );

    my %fir_name_hash = ();
    while(my @row2 = $sqr2->fetchrow_array()) {
        my $author_id = $row2[0];
        my $unit_id = $row2[1];
        my $fir_name = getFirNameInUnit($conn, $unit_id);
        if (not exists $fir_name_hash{$fir_name}) {
            $fir_name_hash{$fir_name} = []; #anonymous arr reference
        }
        $x = $fir_name_hash{$fir_name};
        push @$x, $author_id;
    }

    while(my ($fir_name, $author_id_arr) = each(%fir_name_hash)) {
        my $count = scalar @$author_id_arr;
        if ($count == 1) {next;}
        my $author_id = $author_id_arr->[0];
        for ($i = 1; $i < $count; $i++) {
            #print "$author_id_arr->[$i] => $author_id\n";
            unifyAuthorAndAuthorPaperTable($conn, $author_id, $author_id_arr->[$i]); #just delete in author table, and update in author_paper table 
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

这也应该加快速度.

Answer 2

duf*_*ymo 5

当我看到一个查询和一个循环时,我认为你有一个延迟问题:你查询得到一组值,然后迭代集合做其他事情.如果这意味着集合中每行的数据库往返,那就是很多延迟.

如果您可以使用UPDATE和子选择在单个查询中执行此操作会更好,如果您可以批量处理这些请求并在一次往返中执行所有请求.

如果您明智地使用索引,您将获得额外的加速.WHERE子句中的每一列都应该有一个索引.每个外键都应该有一个索引.

我会在你的查询上运行EXPLAIN PLAN,看看是否有任何TABLE SCAN正在进行.如果有,你必须正确索引.

我想知道一个设计合理的JOIN是否会来救你？

一个表中240,000行,另一个表中77,000行不是那么大的数据库.

归档时间：	13 年，9 月前
查看次数：	753 次
最近记录：	13 年，9 月前