如何按节点类型获取XML :: LibXML中的子节点?

jac*_*ter 3 perl cdata libxml2 xml-parsing

我正在解析复杂的XML文档,一个部分可能如下所示:

<mds>
  <md>
    <value>
      <![CDATA[<?xml version="1.0" encoding="UTF-8"?><record>...</record>]]>
    </value>
  </md>
</mds>
Run Code Online (Sandbox Code Playgroud)

当我解析值节点时,它实际上包含3个子节点,两个空节点和一个cdata节点.有没有办法轻松获得cdata节点,比如

my @dcvalues = $dom->findnodes("//mds/md/value");
my @cdatanodes = $dcvalues[0]->find(<some xpath that only returns cdata nodes>);
my $cdataval = $cdatanodes[0]->textContent;
Run Code Online (Sandbox Code Playgroud)

你明白了.编辑:我知道我可以在这个例子中访问cdata

my $cdatanode = $dcvalues[0]->firstChild->nextSibling;
Run Code Online (Sandbox Code Playgroud)

但后来我依赖cdata始终是第二个节点,我不确定.

Bor*_*din 6

您需要no_blanks解析器选项.像这样

use strict;
use warnings;
use 5.010;

use XML::LibXML;

my $xml = XML::LibXML->load_xml(string => <<END_XML, {no_blanks => 1});
<mds>
  <md>
    <value>
      <![CDATA[<?xml version="1.0" encoding="UTF-8"?><record>...</record>]]>
    </value>
  </md>
</mds>
END_XML


my @values = $xml->findnodes('//mds/md/value/text()');

say scalar @values;

say say $values[0]->textContent;
Run Code Online (Sandbox Code Playgroud)

产量

1
<?xml version="1.0" encoding="UTF-8"?><record>...</record>
Run Code Online (Sandbox Code Playgroud)