如何在不重新访问链接的情况下递归访问链接?

Gle*_*rry 0 recursion perl

我想检查一个站点的链接,然后递归检查这些站点的链接.但我不想两次获取同一页面.我遇到了逻辑问题.这是Perl代码:

my %urls_to_check = ();
my %checked_urls = ();

&fetch_and_parse($starting_url);

use Data::Dumper; die Dumper(\%checked_urls, \%urls_to_check);

sub fetch_and_parse {
    my ($url) = @_;

    if ($checked_urls{$url} > 1) { return 0; }
    warn "Fetching 'me' links from $url";

    my $p = HTML::TreeBuilder->new;

    my $req = HTTP::Request->new(GET => $url);
    my $res = $ua->request($req, sub { $p->parse($_[0])});
    $p->eof();

    my $base = $res->base;

    my @tags = $p->look_down(
        "_tag", "a",
    );

    foreach my $e (@tags) {
        my $full = url($e->attr('href'), $base)->abs;
        $urls_to_check{$full} = 1 if (!defined($checked_urls{$full}));
    }

    foreach my $url (keys %urls_to_check) {
        delete $urls_to_check{$url};
        $checked_urls{$url}++;
        &fetch_and_parse($url);
    }
}
Run Code Online (Sandbox Code Playgroud)

但这似乎并没有真正做到我想要的.

救命?!

编辑:我想从中获取URL $starting_url,然后从生成的提取中获取任何和所有URL.但是,如果其中一个URL链接回来$starting_url,我不想再次获取它.

Que*_*tin 9

最简单的方法是不重新发明轮子并使用CPAN.