perl regex多个组

jer*_*ran 1 regex perl screen-scraping html-parsing

我正在尝试在perl中进行屏幕刮擦,并将其归结为一组表元素.

字符串:

<tr>
        <td>10:11:00</td>
        <td><a href="/page/controller/33">712</a></td>
        <td>Start</td>
        <td>Finish</td>
        <td>200</td>
        <td>44</td>
Run Code Online (Sandbox Code Playgroud)

码:

if($item =~ /<td>(.*)?<\/td>/)
            {
                print "\t$item\n";
                print "\t1: $1\n";
                print "\t2: $2\n";
                print "\t3: $3\n";
                print "\t4: $4\n";
                print "\t5: $5\n";
                print "\t6: $6\n";
            }
Run Code Online (Sandbox Code Playgroud)

输出:

1: 10:11:00
2: 
3: 
4: 
5: 
6: 
Run Code Online (Sandbox Code Playgroud)

我尝试了多种方法,但无法获得预期的结果.想法?

per*_*eal 5

use strict;
use warnings;

my $item = <<EOF;
<tr>
        <td>10:11:00</td>
        <td><a href="/page/controller/33">712</a></td>
        <td>Start</td>
        <td>Finish</td>
        <td>200</td>
        <td>44</td>
EOF

if(my @v = ($item =~ /<td>(.*)<\/td>/g))
{
  print "\t$item\n";
  print "\t1: $v[0]\n";
  print "\t2: $v[1]\n";
  print "\t3: $v[2]\n";
  print "\t4: $v[3]\n";
  print "\t5: $v[4]\n";
  print "\t6: $v[5]\n";
}
Run Code Online (Sandbox Code Playgroud)

要么

if(my @v = ($item =~ /<td>(.*)<\/td>/g))
{
  print "\t$item\n";
  print "\t$_: $v[$_-1]\n" for 1..@v;
}
Run Code Online (Sandbox Code Playgroud)

输出:

1: 10:11:00
2: <a href="/page/controller/33">712</a>
3: Start
4: Finish
5: 200
6: 44
Run Code Online (Sandbox Code Playgroud)