使用lxml的etree创建doctype

Mar*_*ijn 14 python doctype lxml elementtree

我想在我使用LXML的etree生成的XML文档中添加doctypes.

但是我无法弄清楚如何添加doctype.硬编码和连接字符串不是一种选择.

我期待着在etree中添加PI的方式:

pi = etree.PI(...)
doc.addprevious(pi)
Run Code Online (Sandbox Code Playgroud)

但这对我不起作用.如何使用lxml添加到xml文档?

小智 30

这对我有用:

__CODE__


小智 9

您可以使用doctype创建文档,以便开始:

# Adapted from example on http://codespeak.net/lxml/tutorial.html
import lxml.etree as et
import StringIO
s = """<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE root SYSTEM "test" [ <!ENTITY tasty "cheese"> 
<!ENTITY eacute "&#233;"> ]>
<root>
<a>&tasty; souffl&eacute;</a>
</root>
"""
tree = et.parse(StringIO.StringIO(s))
print et.tostring(tree, xml_declaration=True, encoding="utf-8")
Run Code Online (Sandbox Code Playgroud)

打印:

<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE root SYSTEM "test" [
<!ENTITY tasty "cheese">
<!ENTITY eacute "&#233;">
]>
<root>
<a>cheese soufflé</a>
</root>
Run Code Online (Sandbox Code Playgroud)

如果要将doctype添加到某些未使用doc创建的XML中,可以先创建一个具有所需doctype的文档(如上所述),然后将无doctype的XML复制到其中:

xml = et.XML("<root><test/><a>whatever</a><end_test/></root>")
root = tree.getroot()
root[:] = xml
root.text, root.tail = xml.text, xml.tail
print et.tostring(tree, xml_declaration=True, encoding="utf-8")
Run Code Online (Sandbox Code Playgroud)

打印:

<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE root SYSTEM "test" [
<!ENTITY tasty "cheese">
<!ENTITY eacute "&#233;">
]>
<root><test/><a>whatever</a><end_test/></root>
Run Code Online (Sandbox Code Playgroud)

这就是你要找的东西吗?


小智 5

实际上,PI是作为“ doc”中的先前元素添加的。因此,它不是“ doc”的子代。您必须使用“ doc.getroottree()”

这是一个例子:

>>> root = etree.Element("root")
>>> a  = etree.SubElement(root, "a")
>>> b = etree.SubElement(root, "b")
>>> root.addprevious(etree.PI('xml-stylesheet', 'type="text/xsl" href="my.xsl"'))
>>> print etree.tostring(root, pretty_print=True, xml_declaration=True, encoding='utf-8')
<?xml version='1.0' encoding='utf-8'?>
<root>
  <a/>
  <b/>
</root>
Run Code Online (Sandbox Code Playgroud)

使用getroottree():

>>> print etree.tostring(root.getroottree(), pretty_print=True, xml_declaration=True, encoding='utf-8')
<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet type="text/xsl" href="my.xsl"?>
<root>
  <a/>
  <b/>
</root>
Run Code Online (Sandbox Code Playgroud)