rik*_*kki -1 html perl html-table
我是perl编程的新手,现在陷入了非常严重的困难.实际上我必须解析一个包含单个表的html文件,我必须从那里提取一行,其中一个列条目是我所知道的.
我的html文件看起来像这样 -
many previous rows description in html format....
<td>some_value_default</td>
<td>0x0</td>
<td><a href="something" target="xyz">something</a></td>
<td>abcd</td>
//*
<tr><a name="Maximum_Capacity"></a>
<td>some 23:4</td>
<td>some_27: 15</td>
<td>24:29</td>
<td>17</td>
<td colspan=3>Maximum_Capacity</td>
<td colspan=5>
some commonly use value are: 24:31|25:67|677:89|xyz abc
</td>
//*
<td>some_value_default</td>
<td> 0x0</td>
<td><a href="something.html" target="ren">sometext</a></td>
<td>again some text</td>
description of many rows in html afterwards...
Run Code Online (Sandbox Code Playgroud)
//*之间的行表示我想要获取的行.我想使用它中包含的信息.如何在数组中获取该行,以便每个列条目都存储为数组元素.
请大家试着帮助我.
使用HTML :: TableExtract处理HTML文档中的表.这是一个很好的工具.
一个非常基本的例子
use warnings;
use strict;
use feature 'say';
use List::MoreUtils qw(none);
use HTML::TableExtract;
my $file = shift @ARGV;
die "Usage: $0 html-file\n" if not $file or not -f $file;
my $html = do { # read the whole file into $html string
local $/;
open my $fh, '<', $file or die "Can't open $file: $!";
<$fh>;
};
my $te = HTML::TableExtract->new;
$te->parse($page);
# Print all tables in this html page
foreach my $ts ($te->tables) {
say "Table (", join(',', $ts->coords), "):";
foreach my $row ($ts->rows) {
say "\t", join ',', grep { defined } @$row;
}
}
# Assume that the table of interest is the second one
my $table = ($te->tables)[1];
foreach my $row ($table->rows) {
# Select the row you need; for example, identify distinct text in a cell
next if none { defined and /Maximum_Capacity/ } @$row;
say "\t", join ',', grep { defined } @$row;
}
Run Code Online (Sandbox Code Playgroud)
该模块提供了许多方法来设置解析首选项,指定表,检索元素,使用标题等.请参阅文档并搜索此站点以获取相关帖子.
我用none从列表:: MoreUtils测试,如果没有一个列表的元素满足条件.
| 归档时间: |
|
| 查看次数: |
134 次 |
| 最近记录: |