Perl XML::LibXML 获取标签之外的数据

Lka*_*abo 1 xml perl xml-libxml

作为我上一个问题的后续问题(Perl XML::LibXML 从特定节点获取信息

给定以下 XML 数据,我无法弄清楚如何获取标记后显示的数据<tab/>(该标记没有结束标记,而不从该部分内的子节点获取所有数据?有关更多详细信息,请参阅下文:

XML 示例:

<title number="3">
<catchline>Uniform Agricultural Cooperative Association Act</catchline>
<chapter number="3-1">
<catchline>
General Provisions Relating to Agricultural Cooperative Associations
</catchline>
<section number="3-1-1">
<histories>
<history>
Amended by Chapter
<modchap sess="2010GS">378</modchap>
, 2010 General Session
</history>
<modyear>2010</modyear>
</histories>
<catchline>Declaration of policy.</catchline>
<tab/>
It is the declared policy of this state, as one means of improving the economic position of agriculture, to encourage the organization of producers of agricultural products into effective associations under the control of such producers, and to that end this act shall be liberally construed. THIS IS THE DATA THAT I WANT TO GET
</section>
<section number="3-1-1.1">
<histories>
<history>
Amended by Chapter
<modchap sess="1996GS">79</modchap>
, 1996 General Session
</history>
<modyear>1996</modyear>
</histories>
<catchline>General corporation laws do not apply.</catchline>
<tab/>
<xref depth="1" refnumber="16-10a" start="0">
Title 16, Chapter 10a, Utah Revised Business Corporation Act
</xref>
, does not apply to domestic or foreign corporations governed by this chapter, except as specifically provided in Sections
<xref depth="3" refnumber="3-1-13.4" start="0">3-1-13.4</xref>
,
<xref depth="3" refnumber="3-1-13.7" start="0">3-1-13.7</xref>
, and
<xref depth="3" refnumber="3-1-16.1" start="0">3-1-16.1</xref>
.
</section>
</chapter>
</title>
Run Code Online (Sandbox Code Playgroud)

这是我当前的 perl 脚本:

!/usr/bin/perl -w


use XML::LibXML;


my $dom = XML::LibXML->load_xml(location => "file.xml");
my $titleName = $dom->findvalue('/title/catchline');
print "Title $titleName\n";

my @chapters = $dom->findnodes('/title/chapter');

for $chapter (@chapters) {
        my $chapterNo = $chapter->getAttribute('number');
        my $chapterName = $chapter->findvalue('catchline');
        print " Chapter #$chapterNo - $chapterName\n";

        my @sections = $chapter->findnodes('section');

        for $section (@sections) {
                my $sectionNo = $section->getAttribute('number');
                my $sectionName = $section->findvalue('catchline');
                my $sectionData = $section->textContent;
                print "  Section #$sectionNo - $sectionName\nSECDATA: $sectionData\n\n";

        }
}

Run Code Online (Sandbox Code Playgroud)

这可行,但发生的情况可能正是我所要求的,它打印<section>$sectionData 变量中的所有内容。

我想做的只是从标签后获取数据,而<tab/>无需标签内的任何其他内容。比如儿童标签<histories><history><xref>等等。

例如,字符串:

,不适用于受本章管辖的国内或外国公司,除非各节中明确规定

不包含在任何特定标签中,我如何获取该数据?

当前输出为:

Title Uniform Agricultural Cooperative Association Act
 Chapter #3-1 - 
General Provisions Relating to Agricultural Cooperative Associations

  Section #3-1-1 - Declaration of policy.
SECDATA: 


Amended by Chapter
378
, 2010 General Session

2010

Declaration of policy.

It is the declared policy of this state, as one means of improving the economic position of agriculture, to encourage the organization of producers of agricultural products into effective associations under the control of such producers, and to that end this act shall be liberally construed.


  Section #3-1-1.1 - General corporation laws do not apply.
SECDATA: 


Amended by Chapter
79
, 1996 General Session

1996

General corporation laws do not apply.


Title 16, Chapter 10a, Utah Revised Business Corporation Act

, does not apply to domestic or foreign corporations governed by this chapter, except as specifically provided in Sections
3-1-13.4
,
3-1-13.7
, and
3-1-16.1
.
Run Code Online (Sandbox Code Playgroud)

但我正在寻找的更像是:

Title Uniform Agricultural Cooperative Association Act
 Chapter #3-1 - 
General Provisions Relating to Agricultural Cooperative Associations

  Section #3-1-1 - Declaration of policy.
SECDATA: 
It is the declared policy of this state, as one means of improving the economic position of agriculture, to encourage the organization of producers of agricultural products into effective associations under the control of such producers, and to that end this act shall be liberally construed.


  Section #3-1-1.1 - General corporation laws do not apply.
SECDATA: 
, does not apply to domestic or foreign corporations governed by this chapter, except as specifically provided in Sections
Run Code Online (Sandbox Code Playgroud)

ike*_*ami 5

如果您想要元素后面的所有节点(即元素和文本节点)tab,您可以使用以下命令:

my @post_tab_nodes = $section_node->findnodes('tab/following-sibling::node()');
Run Code Online (Sandbox Code Playgroud)

将结果节点呈现为文本是留给用户的练习。您可以使用 来区分元素节点和文本节点$node->nodeType。它分别返回这些节点类型的XML_ELEMENT_NODEXML_TEXT_NODE(由 XML::LibXML 导出)。