jer*_*ran 1 regex perl screen-scraping html-parsing
我正在尝试在perl中进行屏幕刮擦,并将其归结为一组表元素.
字符串:
<tr>
<td>10:11:00</td>
<td><a href="/page/controller/33">712</a></td>
<td>Start</td>
<td>Finish</td>
<td>200</td>
<td>44</td>
Run Code Online (Sandbox Code Playgroud)
码:
if($item =~ /<td>(.*)?<\/td>/)
{
print "\t$item\n";
print "\t1: $1\n";
print "\t2: $2\n";
print "\t3: $3\n";
print "\t4: $4\n";
print "\t5: $5\n";
print "\t6: $6\n";
}
Run Code Online (Sandbox Code Playgroud)
输出:
1: 10:11:00
2:
3:
4:
5:
6:
Run Code Online (Sandbox Code Playgroud)
我尝试了多种方法,但无法获得预期的结果.想法?
use strict;
use warnings;
my $item = <<EOF;
<tr>
<td>10:11:00</td>
<td><a href="/page/controller/33">712</a></td>
<td>Start</td>
<td>Finish</td>
<td>200</td>
<td>44</td>
EOF
if(my @v = ($item =~ /<td>(.*)<\/td>/g))
{
print "\t$item\n";
print "\t1: $v[0]\n";
print "\t2: $v[1]\n";
print "\t3: $v[2]\n";
print "\t4: $v[3]\n";
print "\t5: $v[4]\n";
print "\t6: $v[5]\n";
}
Run Code Online (Sandbox Code Playgroud)
要么
if(my @v = ($item =~ /<td>(.*)<\/td>/g))
{
print "\t$item\n";
print "\t$_: $v[$_-1]\n" for 1..@v;
}
Run Code Online (Sandbox Code Playgroud)
输出:
1: 10:11:00
2: <a href="/page/controller/33">712</a>
3: Start
4: Finish
5: 200
6: 44
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
156 次 |
| 最近记录: |