Regex根据其他列获取列信息

Ami*_*mir 1 html regex perl

我有这个:

<table border="1" cellspacing="1" cellpadding="0">

<tbody>

<tr><th class="align-left" style="text-align: left;">Name</th><th>Type</th><th>Size</th><th>Values</th><th>Description</th><th>Attributes</th><th>Default</th></tr>

<tr>

<td>E-mail</td>

<td>text</td>

<td>60</td>

<td>test@test.com</td>

<td>&#160;</td>

<td>M</td>

<td>test@test.com</td>

</tr>

<tr>

<td>Phone</td>

<td>text</td>

<td>20</td>

<td>01-250 481 00</td>

<td>&#160;</td>

<td>&#160;</td>

<td>&#160;</td>

</tr>

</tbody>

</table>
Run Code Online (Sandbox Code Playgroud)

这是代码的样子:

在此输入图像描述

我想基于(名称)左边的(值)用regex/regexp提取信息,但我不知道这是否可能......

例如,我想搜索"电话"并获得"01-250 481 00"

你怎么看?

Ste*_*ker 5

不要使用正则表达式来解析HTML.使用HTML解析器将HTML转换为DOM树.然后在DOM域中执行操作.例如

use HTML::TreeParser;

my $parser = HTML::TreeParser->new;
my $root   = $parser->parse_content($html_string);

my $table = $root->look_down(_tag => 'table');
my @rows  = $table->look_down(_tag => 'tr');
for my $row (@rows) {
    # perform your row operation here using HTML::Element methods
    # search, replace, insert, modify content...

    my @columns = $row->look_down(_tag => 'tr');

    # we need 1st (Name) and 4th (Values) column
    if (@columns >= 4) {
        if ($column[0]->as_trimmed_text() eq "Phone") {
            my $number = $column[3]->as_trimmed_text();
            ...
        }
    }
}

# if you need to dump the modified tree again...
print $root->as_HTML();

# IMPORTANT: must be done after done with DOM tree!
$root->delete();
Run Code Online (Sandbox Code Playgroud)