我正在尝试解析这个HTML块:
<div class="v120WrapperInner"><a href="/redirect?q=http%3A%2F%2Fwww.google.com%2Faclk%3Fsa%3DL%26ai%3DCKJh--O7tSsCVIKeyoQTwiYmRA5SnrIsB1szYhg2d2J_EAhABIJ7rxQ4oA1CLk676B2DJntmGyKOQGcgBAaoEFk_Qyu5ipY7edN5ETLuchKUCHbY4SA#0%26num%3D1%26sig%3DAGiWqtwtAf8NslosN7AuHb7qC7RviHVg7A%26q%3Dhttp%3A%2F%2Fwww.youtube.com%2Fwatch%253Fv%253D91sYT_8CN8Q%2526feature%253Dpyv%2526ad%253D3409309746%2526kw%253Dsusan%25252#0boyle&adtype=pyv&event=ad&usg=bR7ErKA_3szWtQMGe2lt1dpxzHc=" title="The Valley Downs Chicago"><img class="vimg120" alt="The Valley Downs Chicago" src="http://i2.ytimg.com/vi/91sYT_8CN8Q/1.jpg">
Run Code Online (Sandbox Code Playgroud)
捕获重定向链接:
/redirect?q=http%3A%2F%2Fwww.google.com%2Faclk%3Fsa%3DL%26ai%3DCKJh--O7tSsCVIKeyoQTwiYmRA5SnrIsB1szYhg2d2J_EAhABIJ7rxQ4oA1CLk676B2DJntmGyKOQGcgBAaoEFk_Qyu5ipY7edN5ETLuchKUCHbY4SA#0%26num%3D1%26sig%3DAGiWqtwtAf8NslosN7AuHb7qC7RviHVg7A%26q%3Dhttp%3A%2F%2Fwww.youtube.com%2Fwatch%253Fv%253D91sYT_8CN8Q%2526feature%253Dpyv%2526ad%253D3409309746%2526kw%253Dsusan%25252#0boyle&adtype=pyv&event=ad&usg=bR7ErKA_3szWtQMGe2lt1dpxzHc=
Run Code Online (Sandbox Code Playgroud)
和视频标题:
The Valley Downs Chicago
Run Code Online (Sandbox Code Playgroud)
当我使用这个简单的Perl代码时:
foreach $_ (@promotedVideos)
{
if (/\s<div class="v120WrapperInner"><a href="([^"]*)" title="([^"]*)"><img/six)
{
print $1;
print $2;
}
}
Run Code Online (Sandbox Code Playgroud)
没有打印.虽然我正在对此进行故障排除,但我想如果您发现任何错误或有问题,我会问您专家.非常感谢您的帮助!
你的/ x正则表达式修饰符会混淆空格.去掉它.
也就是说,它应该是
if (/\s<div class="v120WrapperInner"><a href="([^"]*)" title="([^"]*)"><img/si)
Run Code Online (Sandbox Code Playgroud)
/ x使perl忽略正则表达式中的空格,使你的正则表达式相当于以下内容:
/\s<divclass="v120WrapperInner"><a href="([^"]*)"title="([^"]*)"><img/six
Run Code Online (Sandbox Code Playgroud)
那是不匹配的.
在开始的时候也可能会制造东西.
这是我用于测试的代码:
use strict;
my $inp = '<div class="v120WrapperInner"><a href="/redirect?q=http%3A%2F%2Fwww.google.com%2Faclk%3Fsa%3DL%26ai%3DCKJh--O7tSsCVIKeyoQTwiYmRA5SnrIsB1szYhg2d2J_EAhABIJ7rxQ4oA1CLk676B2DJntmGyKOQGcgBAaoEFk_Qyu5ipY7edN5ETLuchKUCHbY4SA#0%26num%3D1%26sig%3DAGiWqtwtAf8NslosN7AuHb7qC7RviHVg7A%26q%3Dhttp%3A%2F%2Fwww.youtube.com%2Fwatch%253Fv%253D91sYT_8CN8Q%2526feature%253Dpyv%2526ad%253D3409309746%2526kw%253Dsusan%25252#0boyle&adtype=pyv&event=ad&usg=bR7ErKA_3szWtQMGe2lt1dpxzHc=" title="The Valley Downs Chicago"><img class="vimg120" alt="The Valley Downs Chicago" src="http://i2.ytimg.com/vi/91sYT_8CN8Q/1.jpg">';
print "$inp\n";
if ( $inp =~ /<div class="v120WrapperInner"><a href="([^"]*)" title="([^"]*)"><img/si )
{
print "m:\n$1\n$2\n";
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
302 次 |
| 最近记录: |