如何从Perl中的正文电子邮件中提取href?

snn*_*fer 0 regex email perl html-parsing

我正在试图提取一些网址,它可能不止一个,它来自一个正文邮件.

而我正试图解析网址,用这个:

use strict;
use warnings;
use Net::IMAP::Simple;
use Email::Simple;
use IO::Socket::SSL;

# here must be the connection to imap hidden for economize space

my $es = Email::Simple->new( join '', @{ $imap->get($i) } );
my $text = $es->body;
print $text;
my $matches = ($text =~/<a[^>]*href="([^"]*)"[^>]*>.*<\/a>/);
print $matches;
Run Code Online (Sandbox Code Playgroud)

在$ text上我有下一个文字:

 --047d7b47229eb3d9f404e58fd90a
    Content-Type: text/plain; charset=ISO-8859-1

    Try1 <http://www.washingtonpost.com/>

    Try2 <http://www.thesun.co.uk/sol/homepage/>

    --047d7b47229eb3d9f404e58fd90a
    Content-Type: text/html; charset=ISO-8859-1

    <div dir="ltr"><a href="http://www.washingtonpost.com/">Try1</a><br><div><br></div><div><a href="http://www.thesun.co.uk/sol/homepage/">Try2</a><br></div></div>

    --047d7b47229eb3d9f404e58fd90a--
Run Code Online (Sandbox Code Playgroud)

程序的输出,给我一个1......就是这样.

有谁可以帮忙?

谢谢你的建议.

dax*_*xim 6

Email :: Simple不适用于MIME邮件.请改用Courriel.正则表达式不适合HTML解析.请改用Web :: Query.

use Courriel qw();
use Web::Query qw();

my $email = Courriel->parse( text => join …);
my $html = $email->html_body_part;
my @url = Web::Query->new_from_html($html)->find('a[href]')->attr('href');
__END__
http://www.washingtonpost.com/
http://www.thesun.co.uk/sol/homepage/
Run Code Online (Sandbox Code Playgroud)