如何在Python中使用多个名称空间验证XML?

Jim*_*Jim 5 python xml validation xsd

我正在尝试在Python 2.7中编写一些单元测试,以针对我对OAI-PMH模式所做的某些扩展进行验证:http : //www.openarchives.org/OAI/2.0/OAI-PMH.xsd

我遇到的问题是具有多个嵌套名称空间的业务是由上述XSD中的此规范引起的:

<complexType name="metadataType">
    <annotation>
        <documentation>Metadata must be expressed in XML that complies
        with another XML Schema (namespace=#other). Metadata must be 
        explicitly qualified in the response.</documentation>
    </annotation>
    <sequence>
        <any namespace="##other" processContents="strict"/>
    </sequence>
</complexType>
Run Code Online (Sandbox Code Playgroud)

这是我正在使用的代码的片段:

import lxml.etree, urllib2

query = "http://localhost:8080/OAI-PMH?verb=GetRecord&by_doc_ID=false&metadataPrefix=nsdl_dc&identifier=http://www.purplemath.com/modules/ratio.htm"
schema_file = file("../schemas/OAI/2.0/OAI-PMH.xsd", "r")
schema_doc = etree.parse(schema_file)
oaischema = etree.XMLSchema(schema_doc)

request = urllib2.Request(query, headers=xml_headers)
response = urllib2.urlopen(request)
body = response.read()
response_doc = etree.fromstring(body)

try:
    oaischema.assertValid(response_doc)
except etree.DocumentInvalid as e:
     line = 1;
     for i in body.split("\n"):
        print "{0}\t{1}".format(line, i)
        line += 1
     print(e.message)
Run Code Online (Sandbox Code Playgroud)

我最终遇到以下错误:

AssertionError: http://localhost:8080/OAI-PMH?verb=GetRecord&by_doc_ID=false&metadataPrefix=nsdl_dc&identifier=http://www.purplemath.com/modules/ratio.htm
Element '{http://www.openarchives.org/OAI/2.0/oai_dc/}oai_dc': No matching global element declaration available, but demanded by the strict wildcard., line 22
Run Code Online (Sandbox Code Playgroud)

我理解该错误,因为该模式要求严格验证元数据元素的子元素,而示例xml则需要这样做。

现在,我已经用Java编写了一个有效的验证器-但是,使用Python验证它会有所帮助,因为我要构建的其余解决方案都是基于Python的。为了使我的Java变体正常工作,我必须使我的DocumentFactory名称空间知道,否则会出现相同的错误。我在python中找不到任何可以正确执行此验证的工作示例。

有没有人知道如何在示例文档使用Python验证时如何获得具有多个嵌套名称空间的XML文档?

这是我要验证的示例XML文档:

<?xml version="1.0" encoding="UTF-8"?> 
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" 
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
     http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2002-02-08T08:55:46Z</responseDate>
  <request verb="GetRecord" identifier="oai:arXiv.org:cs/0112017"
       metadataPrefix="oai_dc">http://arXiv.org/oai2</request>
  <GetRecord>
   <record> 
    <header>
      <identifier>oai:arXiv.org:cs/0112017</identifier> 
      <datestamp>2001-12-14</datestamp>
      <setSpec>cs</setSpec> 
      <setSpec>math</setSpec>
    </header>
    <metadata>
      <oai_dc:dc 
     xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" 
     xmlns:dc="http://purl.org/dc/elements/1.1/" 
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
     xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ 
     http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
    <dc:title>Using Structural Metadata to Localize Experience of 
          Digital Content</dc:title> 
    <dc:creator>Dushay, Naomi</dc:creator>
    <dc:subject>Digital Libraries</dc:subject> 
    <dc:description>With the increasing technical sophistication of 
        both information consumers and providers, there is 
        increasing demand for more meaningful experiences of digital 
        information. We present a framework that separates digital 
        object experience, or rendering, from digital object storage 
        and manipulation, so the rendering can be tailored to 
        particular communities of users.
    </dc:description> 
    <dc:description>Comment: 23 pages including 2 appendices, 
        8 figures</dc:description> 
    <dc:date>2001-12-14</dc:date>
      </oai_dc:dc>
    </metadata>
  </record>
 </GetRecord>
</OAI-PMH>
Run Code Online (Sandbox Code Playgroud)

Nei*_*tos 0

在lxml 的验证文档中找到了这一点:

>>> schema_root = etree.XML('''\
...   <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
...     <xsd:element name="a" type="xsd:integer"/>
...   </xsd:schema>
... ''')
>>> schema = etree.XMLSchema(schema_root)

>>> parser = etree.XMLParser(schema = schema)
>>> root = etree.fromstring("<a>5</a>", parser)
Run Code Online (Sandbox Code Playgroud)

那么,也许,您需要的是这个?(参见最后两行。):

schema_doc = etree.parse(schema_file)
oaischema = etree.XMLSchema(schema_doc)

request = urllib2.Request(query, headers=xml_headers)
response = urllib2.urlopen(request)
body = response.read()
parser = etree.XMLParser(schema = oaischema)
response_doc = etree.fromstring(body, parser)
Run Code Online (Sandbox Code Playgroud)