标签: html-treebuilder

来自html文件的Perl提取模式

我有一个充满链接的.html文件,我想提取没有http://的域名(所以只是链接的主机名部分,例如blah.com)列出它们并删除重复项.

这是我到目前为止所提出的 - 我认为问题是我试图传递$ tree数据的方式

#!/usr/local/bin/perl -w

use HTML::TreeBuilder 5 -weak; # Ensure weak references in use
use URI;
  foreach my $file_name (@ARGV) {
    my $tree = HTML::TreeBuilder->new; # empty tree
    $tree->parse_file($file_name);
    my $u1 = URI->new($tree);
    print "host: ", $u1->host, "\n";
    print "Hey, here's a dump of the parse tree of $file_name:\n";

    # Now that we're done with it, we must destroy it.
    # $tree = $tree->delete; # Not required with weak references
  }
Run Code Online (Sandbox Code Playgroud)

perl uri html-treebuilder

5
推荐指数
1
解决办法
245
查看次数

或匹配HTML :: TreeBuilder的look_down功能

试图匹配tr的是有一个项目class任何的前三个字母开头eveday.这是我的尝试:

my @stuff = $p->look_down(
    _tag => 'tr',
    class => 'qr/eve*|day*/g'
);

foreach (@stuff) {
        print $_->as_text;
};
Run Code Online (Sandbox Code Playgroud)

只是好奇,有什么样的物品@stuff


这个可以吗?见下文:

my @stuff = $p->look_down(
    _tag => 'tr',
    class => qr/eve.*|day.*/
);

print "\n\n";

foreach (@stuff) {
        print $_->as_text . "\n\n";
};
Run Code Online (Sandbox Code Playgroud)

regex perl html-treebuilder

3
推荐指数
1
解决办法
617
查看次数

在 Perl 中使用 HTML::TreeBuilder 提取特定 span 类的所有实例

试图制作一个 Perl 脚本来打开一个 HTML 文件并提取<span class="postertrip">标签中包含的任何内容。

示例 HTML:

<table>
   <tbody>
      <tr>
         <td class="doubledash">&gt;&gt;</td>
         <td class="reply" id="reply2">
            <a name="2"></a> <label><input type="checkbox" name="delete" value="1199313466,2" /> <span class="replytitle"></span> <span class="commentpostername"><a href="test">Test1</a></span><span class="postertrip"><a href="test">!AAAAAAAA</a></span>  08/01/03(Thu)02:06</label> <span class="reflink"> <a href="test">No.2</a> </span>&nbsp;  <br /> <span class="filesize">File: <a target="_blank" href="test">1199326003295.jpg</a> -(<em>65843 B, 288x412</em>)</span> <span class="thumbnailmsg">Thumbnail displayed, click image for full size.</span><br />  <a target="_blank" test"> <img src="test" width="139" height="200" alt="65843" class="thumb" /></a>    
            <blockquote>
               <p>Test message 1</p>
            </blockquote>
         </td>
      </tr>
   </tbody>
</table>
<table>
   <tbody>
      <tr>
         <td …
Run Code Online (Sandbox Code Playgroud)

html perl html-treebuilder

3
推荐指数
2
解决办法
105
查看次数

标签 统计

html-treebuilder ×3

perl ×3

html ×1

regex ×1

uri ×1