需要建议找到div的内容的好方法

Iva*_*ang 3 html regex perl dom

<div class="box notranslate" id="venueHours">
<h5 class="translate">Hours</h5>
<div class="status closed">Currently closed</div>
<div class="hours">
  <div class="timespan">
    <div class="openTime">
      <div class="days">Mon,Tue,Wed,Thu,Sat</div>
      <span class="hours"> 10:00 AM–6:00 PM</span>
    </div>
  </div>
  <div class="timespan">
    <div class="openTime">
      <div class="days">Fri</div>
      <span class="hours"> 10:00 AM–9:00 PM</span></div>
    </div>
    <div class="timespan">
      <div class="openTime">
        <div class="days">Sun</div>
        <span class="hours"> 10:00 AM–5:00 PM</span>
      </div>
    </div>
  </div>
</div>
Run Code Online (Sandbox Code Playgroud)

我试图捕捉到的内容中的所有<div class="days"><span class="hours">.我想我可以在这个任务中使用正则表达式.但我也想学习任何有趣或专业的方法来捕捉像这样的特定div块.谢谢.

Joe*_*ger 7

除了其他地方提到的HTML解析库之外,其他模块也具有DOM功能.参见例如Web::QueryMojolicious' Mojo::DOM.

以下是使用Mojo::DOM和CSS3选择器的示例:

#!/usr/bin/env perl

use strict;
use warnings;

use 5.10.0;
use Mojo::DOM;

my $dom = Mojo::DOM->new(<<'HTML');
<div class="box notranslate" id="venueHours">
<h5 class="translate">Hours</h5>
<div class="status closed">Currently closed</div>
<div class="hours">
  <div class="timespan">
    <div class="openTime">
      <div class="days">Mon,Tue,Wed,Thu,Sat</div>
      <span class="hours"> 10:00 AM–6:00 PM</span>
    </div>
  </div>
  <div class="timespan">
    <div class="openTime">
      <div class="days">Fri</div>
      <span class="hours"> 10:00 AM–9:00 PM</span></div>
    </div>
    <div class="timespan">
      <div class="openTime">
        <div class="days">Sun</div>
        <span class="hours"> 10:00 AM–5:00 PM</span>
      </div>
    </div>
  </div>
</div>
HTML

say "div days:";
say $_->text for $dom->find('div.days')->each;

say "\nspan hours:";
say $_->text for $dom->find('span.hours')->each;
Run Code Online (Sandbox Code Playgroud)

或等效地:

say "div days:";
say for $dom->find('div.days')->map(sub{$_->text})->each;

say "\nspan hours:";
say for $dom->find('span.hours')->map(sub{$_->text})->each;
Run Code Online (Sandbox Code Playgroud)

输出:

div days:
Mon,Tue,Wed,Thu,Sat
Fri
Sun

span hours:
 10:00 AM–6:00 PM
 10:00 AM–9:00 PM
 10:00 AM–5:00 PM
Run Code Online (Sandbox Code Playgroud)

或者要获得与日期相对应的时间,您可以使用openTimesdiv的子项:

say "Open Times:";
say for $dom->find('div.openTime')
            ->map(sub{$_->children->each})
            ->map(sub{$_->text})
            ->each;
Run Code Online (Sandbox Code Playgroud)

输出:

Open Times:
Mon,Tue,Wed,Thu,Sat
 10:00 AM–6:00 PM
Fri
 10:00 AM–9:00 PM
Sun
 10:00 AM–5:00 PM
Run Code Online (Sandbox Code Playgroud)

编辑:Daxim发布了类似的Web::Query代码作为评论,所以我会在这里重新发布它以获得更好的格式.我没试过,但我一般都相信他的代码.假设HTML在变量中$html:

use Web::Query qw(); 
my $w = Web::Query->new_from_html($html);
say "div days:";
say for $w->find('div.days')->text; 
say "\nspan hours:"; 
say for $w->find('span.hours')->text; 
say "Open Times:"; 
$w->find('div.openTime')->each(sub { say for $_->find('*')->text });
Run Code Online (Sandbox Code Playgroud)