lhd*_*ver 1 mysql perl performance
unit
id fir_name sec_name
author
id name unit_id
author_paper
id author_id paper_id
Run Code Online (Sandbox Code Playgroud)
我想统一作者['同一作者'意味着名称是相同的,他们的单位'fir_names是相同的',我必须同时更改author_paper表.
这是我做的:
$conn->do('create index author_name on author (name)');
my $sqr = $conn->prepare("select name from author group by name having count(*) > 1");
$sqr->execute();
while(my @row = $sqr->fetchrow_array()) {
my $dup_name = $row[0];
$dup_name = formatHtml($dup_name);
my $sqr2 = $conn->prepare("select id, unit_id from author where name = '$dup_name'");
$sqr2->execute();
my %fir_name_hash = ();
while(my @row2 = $sqr2->fetchrow_array()) {
my $author_id = $row2[0];
my $unit_id = $row2[1];
my $fir_name = getFirNameInUnit($conn, $unit_id);
if (not exists $fir_name_hash{$fir_name}) {
$fir_name_hash{$fir_name} = []; #anonymous arr reference
}
$x = $fir_name_hash{$fir_name};
push @$x, $author_id;
}
while(my ($fir_name, $author_id_arr) = each(%fir_name_hash)) {
my $count = scalar @$author_id_arr;
if ($count == 1) {next;}
my $author_id = $author_id_arr->[0];
for ($i = 1; $i < $count; $i++) {
#print "$author_id_arr->[$i] => $author_id\n";
unifyAuthorAndAuthorPaperTable($conn, $author_id, $author_id_arr->[$i]); #just delete in author table, and update in author_paper table
}
}
}
Run Code Online (Sandbox Code Playgroud)
从作者中选择计数(*); #240,000从作者中选择计数(不同(名称)); #7,7000它非常慢!!我已经运行了5个小时,它只删除了大约4,0000个重复名称.如何让它运行得更快.我渴望得到你的建议
您不应该在循环中准备第二个sql语句,并且在使用?占位符时可以实际使用该准备:
$conn->do('create index author_name on author (name)');
my $sqr = $conn->prepare('select name from author group by name having count(*) > 1');
# ? is the placeholder and the database driver knows if its an integer or a string and
# quotes the input if needed.
my $sqr2 = $conn->prepare('select id, unit_id from author where name = ?');
$sqr->execute();
while(my @row = $sqr->fetchrow_array()) {
my $dup_name = $row[0];
$dup_name = formatHtml($dup_name);
# Now you can reuse the prepared handle with different input
$sqr2->execute( $dup_name );
my %fir_name_hash = ();
while(my @row2 = $sqr2->fetchrow_array()) {
my $author_id = $row2[0];
my $unit_id = $row2[1];
my $fir_name = getFirNameInUnit($conn, $unit_id);
if (not exists $fir_name_hash{$fir_name}) {
$fir_name_hash{$fir_name} = []; #anonymous arr reference
}
$x = $fir_name_hash{$fir_name};
push @$x, $author_id;
}
while(my ($fir_name, $author_id_arr) = each(%fir_name_hash)) {
my $count = scalar @$author_id_arr;
if ($count == 1) {next;}
my $author_id = $author_id_arr->[0];
for ($i = 1; $i < $count; $i++) {
#print "$author_id_arr->[$i] => $author_id\n";
unifyAuthorAndAuthorPaperTable($conn, $author_id, $author_id_arr->[$i]); #just delete in author table, and update in author_paper table
}
}
}
Run Code Online (Sandbox Code Playgroud)
这也应该加快速度.
当我看到一个查询和一个循环时,我认为你有一个延迟问题:你查询得到一组值,然后迭代集合做其他事情.如果这意味着集合中每行的数据库往返,那就是很多延迟.
如果您可以使用UPDATE和子选择在单个查询中执行此操作会更好,如果您可以批量处理这些请求并在一次往返中执行所有请求.
如果您明智地使用索引,您将获得额外的加速.WHERE子句中的每一列都应该有一个索引.每个外键都应该有一个索引.
我会在你的查询上运行EXPLAIN PLAN,看看是否有任何TABLE SCAN正在进行.如果有,你必须正确索引.
我想知道一个设计合理的JOIN是否会来救你?
一个表中240,000行,另一个表中77,000行不是那么大的数据库.