读取没有任何XML模块的xml文件

raj*_*raj -1 xml perl perl-module xml-parsing

我正在尝试使用Perl读取XML表单,但我不能使用任何XML模块,如XML :: Simple,XML :: Parse.

它是一个简单的XML表单,包含一些基本信息和MS Doc附件.我想阅读此XML并下载此附加的Doc文件,然后在屏幕上打印XML信息.

但是我不知道如何在没有XML模块的情况下如何做到这一点,我听说XML文件可以使用Data :: Dumper进行解析,但我不熟悉这个模块,所以没有得到如何做到这一点.

如果没有XML模块有任何办法可以帮助我吗?

示例XML:

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
</catalog>
Run Code Online (Sandbox Code Playgroud)

Sob*_*que 5

我想重申这是一个不好的想法.因为虽然XML 看起来像纯文本 - 但它不是纯文本.如果你这样对待它,你就会创建脆弱,不可维护和不受支持的代码,这可能有一天会破坏,因为有人会以有效的方式更改XML格式.

我强烈建议您的第一个调用端口是返回到您的项目,并指出如何在没有XML解析器的情况下解析XML就像尝试使用锤子将螺钉放入一块木头中一样.因为它有点工作,但结果是相当粗制滥调,坦率地说,这是完全没有必要的,因为螺丝刀存在并且它们能够正常,轻松地完成工作并且可以广泛使用.

例如

你能告诉我如何使用XML模块打印上述XML文件的每个book id的作者,标题和价格吗?

#!/usr/bin/env perl
use strict;
use warnings;

use XML::Twig;
my $twig = XML::Twig -> new -> parsefile ( 'your_file.xml' );
foreach my $book ( $twig -> get_xpath ( '//book' ) ) {
    print join ("\n", 
         $book -> att('id'),
         $book -> field('author'),
         $book -> field('title'),
         $book -> field('price'), ),"\n----\n";
}
Run Code Online (Sandbox Code Playgroud)

然而:

鉴于您的具体样本,您可以将其视为"纯文本".在此之前,你要指出你的项目负责人,这是一个冒险的做法-你在螺丝将用锤子-因此产生的支持问题持续的风险,这是平凡的只是安装一点点自由地解决可用的开源代码.

我只是建议这在所有的,因为我不得不处理可笑不合理的类似项目的要求.

像这样:

#!/usr/bin/env perl
use strict;
use warnings;

while ( <> ) {
   if ( m/<book/ ) { 
       my ( $id ) = ( m/id="(\w+)"/ ); 
       print $id,"\n";
   }
   if ( m/<author/ ) { 
        my ( $author ) = ( m/>(.*)</ );
        print $author,"\n";
   }
}
Run Code Online (Sandbox Code Playgroud)

现在,这不起作用的原因是上面的示例可以完全有效地格式化为:

<?xml version="1.0"?>
<catalog><book id="bk101"><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre><price>44.95</price><publish_date>2000-10-01</publish_date><description>An in-depth look at creating applications 
      with XML.</description></book><book id="bk102"><author>Ralls, Kim</author><title>Midnight Rain</title><genre>Fantasy</genre><price>5.95</price><publish_date>2000-12-16</publish_date><description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description></book></catalog>
Run Code Online (Sandbox Code Playgroud)

要么

<?xml version="1.0"?>
<catalog>
  <book id="bk101">
    <author>Gambardella, Matthew</author>
    <title>XML Developer's Guide</title>
    <genre>Computer</genre>
    <price>44.95</price>
    <publish_date>2000-10-01</publish_date>
    <description>An in-depth look at creating applications 
      with XML.</description>
  </book>
  <book id="bk102">
    <author>Ralls, Kim</author>
    <title>Midnight Rain</title>
    <genre>Fantasy</genre>
    <price>5.95</price>
    <publish_date>2000-12-16</publish_date>
    <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
  </book>
</catalog>
Run Code Online (Sandbox Code Playgroud)

要么:

<?xml version="1.0"?>
<catalog
><book
id="bk101"
><author
>Gambardella, Matthew</author><title
>XML Developer's Guide</title><genre
>Computer</genre><price
>44.95</price><publish_date
>2000-10-01</publish_date><description
>An in-depth look at creating applications 
      with XML.</description></book><book
id="bk102"
><author
>Ralls, Kim</author><title
>Midnight Rain</title><genre
>Fantasy</genre><price
>5.95</price><publish_date
>2000-12-16</publish_date><description
>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description></book></catalog>
Run Code Online (Sandbox Code Playgroud)

要么:

<?xml version="1.0"?>

<catalog>
  <book id="bk101"><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre><price>44.95</price><publish_date>2000-10-01</publish_date><description>An in-depth look at creating applications 
      with XML.</description></book>
  <book id="bk102"><author>Ralls, Kim</author><title>Midnight Rain</title><genre>Fantasy</genre><price>5.95</price><publish_date>2000-12-16</publish_date><description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description></book>
</catalog>
Run Code Online (Sandbox Code Playgroud)

这就是为什么你有这么多评论说'使用解析器' - 从上面的那些片段,我给你的简单例子......只会在一个上工作,并在其他人上乱七八糟.

但该XML::Twig解决方案正确处理它们.XML::Twig在CPAN上免费提供.(还有其他图书馆也可以完成这项工作).它还预先打包了许多操作系统的默认存储库.