使用bash/Perl中的RegEx从html表中提取值

Mag*_*dud 2 regex bash perl monitoring health-monitoring

我想用munin监视我的oki打印机,所以我试图让这个插件适应我的打印机.

我的打印机http服务器中的页面表是:

<table width="560" border="0" cellspacing="2" cellpadding="3">
    <tr class="sub_item_color">
        <td  class="normal" width="200" align="right" valign="bottom" rowspan="2">Media Size</td>
        <td  class="normal" width="90" align="left">Color</td>
        <td  class="normal" width="90" align="left">Color</td>
        <td  class="normal" width="90" align="left">Mono</td>
        <td  class="normal" width="90" align="left">Mono</td>
    </tr>
    <tr class="sub_item_color">
        <td  class="normal" width="90" align="left">A3/Tabloid</td>
        <td  class="normal" width="90" align="left">A4/Letter</td><td  class="normal" width="90" align="left">A3/Tabloid</td>
        <td  class="normal" width="90" align="left">A4/Letter</td>
    </tr>
    <tr class="sub_item_color">
        <td  class="normal" width="200" align="left">Total Impressions</td>
        <td  class="normal" width="90" align="right">21906</td>
        <td  class="normal" width="90" align="right">33491</td>
        <td  class="normal" width="90" align="right">2084</td>
        <td  class="normal" width="90" align="right">4460</td>
    </tr>
    <tr class="sub_item_color">
        <td  class="normal" width="200" align="left">Total A4/Letter Impressions</td>
        <td  class="normal" colspan="2" align="center"><b>Color:77303</B></td>
        <td  class="normal" colspan="2" align="center"><b>Mono:8628</B></td>
    </tr>
</table>
Run Code Online (Sandbox Code Playgroud)

那个munin脚本正在这样做:

infopage=`wget -q -O - http://root:$password@$destination/printer/printerinfo_top.htm | perl -p -e 's/\n/ /m'`
echo tray1.value    `echo $infopage | perl -p -e 's/^.+Tray\ 1\ Page\ Count\:\ \<\/TD\>\<TD\ WIDTH\=\"94\"\>([0-9]+)\<.+$/$1/'`
Run Code Online (Sandbox Code Playgroud)

我如何获得总展示次数?

dax*_*xim 9

解决方案实现为像问题中的Unix过滤器,由于XPath,只有更多的可读性和声明性.

#!/usr/bin/env perl
use 5.010;
use strictures;
use HTML::TreeBuilder::XPath qw();
use List::Util qw(sum);
my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse_content(<>);
say sum map { s[.*:][]; $_ } $tree->findnodes_as_strings('//table/tr/td[@colspan=2]/b');
Run Code Online (Sandbox Code Playgroud)
wget -q -O - http://… | perl sum-total-impressions.pl
Run Code Online (Sandbox Code Playgroud)

  • +1:这是为什么你不应该使用正则表达式解析HTML的一个很好的例子.对于这个问题,主要问题是最初的**解决方案**使用正则表达式来解析HTML而不是XPath. (3认同)