小编use*_*810的帖子

来自html文件的Perl提取模式

我有一个充满链接的.html文件,我想提取没有http://的域名(所以只是链接的主机名部分,例如blah.com)列出它们并删除重复项.

这是我到目前为止所提出的 - 我认为问题是我试图传递$ tree数据的方式

#!/usr/local/bin/perl -w

use HTML::TreeBuilder 5 -weak; # Ensure weak references in use
use URI;
  foreach my $file_name (@ARGV) {
    my $tree = HTML::TreeBuilder->new; # empty tree
    $tree->parse_file($file_name);
    my $u1 = URI->new($tree);
    print "host: ", $u1->host, "\n";
    print "Hey, here's a dump of the parse tree of $file_name:\n";

    # Now that we're done with it, we must destroy it.
    # $tree = $tree->delete; # Not required with weak references
  }
Run Code Online (Sandbox Code Playgroud)

perl uri html-treebuilder

5
推荐指数
1
解决办法
245
查看次数

标签 统计

html-treebuilder ×1

perl ×1

uri ×1