使用SPARQL从HTML页面查询元数据不会返回任何内容

wil*_*ilx 8 perl html5 rdf sparql microdata

我似乎无论是使用有一些问题HTML::HTML5::Microdata::Parser或者RDF::Query或SPARQL语法和语义.我对新闻网站页面的这一点感兴趣.

<div class="authors">
Auto?i: <span itemprop="author" itemscope itemtype="http://schema.org/Person"><a rel="author" itemprop="url" class="name" href="http://vice.idnes.cz/novinari.aspx?idnov=2504" ><span itemprop="name">Zde?ka Trachtová</span></a></span>
,
<span itemprop="author" itemscope itemtype="http://schema.org/Person"><a rel="author" itemprop="url"  href="http://vice.idnes.cz/novinari.aspx?idnov=3495" ><span itemprop="additionalName">san</span></a><span class="h" itemprop="name">Sabina Netrvalová</span></span>
</div>
Run Code Online (Sandbox Code Playgroud)

这是我的测试代码:

#! env perl

use strict;
use Data::Dumper;
use HTML::HTML5::Microdata::Parser;
use RDF::Query;
use IO::Handle;
use LWP::Simple;


STDOUT->binmode(":utf8");
STDERR->binmode(":utf8");

my $htmldoc = LWP::Simple::get(
    "http://zpravy.idnes.cz/zacinaji-zapisy-do-prvnich-trid-dn3-/domaci.aspx?c=A160114_171615_domaci_zt");
die "Could not fetch URL. $@" unless defined $htmldoc;

my $microdata = HTML::HTML5::Microdata::Parser->new (
    $htmldoc, $ARGV[0],
    {auto_config => 1, tdb_service => 1, xhtml_meta => 1, xhtml_rel => 1});
print STDERR "microdata->graph:\n", Dumper($microdata->graph), "\n";

my $query = RDF::Query->new(<<'SPARQL');
PREFIX schema: <http://schema.org/>
SELECT *
WHERE {
   ?author a schema:Person .
}
SPARQL

my $people = $query->execute($microdata->graph);
print STDERR "authors from RDF:\n", Dumper($people), "\n";
while (my $person = $people->next) {
    print STDERR "people: ", $person, "\n";
}
Run Code Online (Sandbox Code Playgroud)

选项HTML::HTML5::Microdata::Parser只是我最后努力使这项工作.(我基本上不知道我在做什么.)

任何想法如何使这项工作,并得到作者的名字?

Mil*_*ler 2

只需使用Mojo::UserAgentMojo::DOM

\n\n
use strict;\nuse warnings;\nuse utf8;\nuse v5.10;\n\nBEGIN {\n    binmode *STDOUT, \':utf8\';\n    binmode *STDERR, \':utf8\';\n}\n\nuse Mojo::UserAgent;\n\nmy $url = "http://zpravy.idnes.cz/zacinaji-zapisy-do-prvnich-trid-dn3-/domaci.aspx?c=A160114_171615_domaci_zt";\n\nmy $dom = Mojo::UserAgent->new->get($url)->res->dom;\n\n# Process all authors\nfor my $span ($dom->find(\'span[itemprop=author]\')->each) {\n    say $span->all_text;\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

输出:

\n\n
Zde\xc5\x88ka Trachtov\xc3\xa1\nsan Sabina Netrvalov\xc3\xa1\n
Run Code Online (Sandbox Code Playgroud)\n\n

有关这些模块的 8 分钟简短教程,请查看Mojocast 第 5 集

\n