用于Python的XML编写工具

Vin*_*vic 35 python xml xhtml

我正在尝试使用ElementTree,它看起来很好,它可以逃脱HTML实体等等.我错过了一些我没听说过的真正精彩的东西吗?

这与我实际做的类似:

import xml.etree.ElementTree as ET
root = ET.Element('html')
head = ET.SubElement(root,'head')
script = ET.SubElement(head,'script')
script.set('type','text/javascript')
script.text = "var a = 'I love á letters'"
body = ET.SubElement(root,'body')
h1 = ET.SubElement(body,'h1')
h1.text = "And I like the fact that 3 > 1"
tree = ET.ElementTree(root)
tree.write('foo.xhtml')

# more foo.xhtml
<html><head><script type="text/javascript">var a = 'I love &amp;aacute;
letters'</script></head><body><h1>And I like the fact that 3 &gt; 1</h1>
</body></html>
Run Code Online (Sandbox Code Playgroud)

Pet*_*ann 28

另一种方法是使用lxml中的E Factory构建器(也可以在Elementtree中使用)

>>> from lxml import etree

>>> from lxml.builder import E

>>> def CLASS(*args): # class is a reserved word in Python
...     return {"class":' '.join(args)}

>>> html = page = (
...   E.html(       # create an Element called "html"
...     E.head(
...       E.title("This is a sample document")
...     ),
...     E.body(
...       E.h1("Hello!", CLASS("title")),
...       E.p("This is a paragraph with ", E.b("bold"), " text in it!"),
...       E.p("This is another paragraph, with a", "\n      ",
...         E.a("link", href="http://www.python.org"), "."),
...       E.p("Here are some reserved characters: <spam&egg>."),
...       etree.XML("<p>And finally an embedded XHTML fragment.</p>"),
...     )
...   )
... )

>>> print(etree.tostring(page, pretty_print=True))
<html>
  <head>
    <title>This is a sample document</title>
  </head>
  <body>
    <h1 class="title">Hello!</h1>
    <p>This is a paragraph with <b>bold</b> text in it!</p>
    <p>This is another paragraph, with a
      <a href="http://www.python.org">link</a>.</p>
    <p>Here are some reservered characters: &lt;spam&amp;egg&gt;.</p>
    <p>And finally an embedded XHTML fragment.</p>
  </body>
</html>
Run Code Online (Sandbox Code Playgroud)


oas*_*bob 24

总有SimpleXMLWriter,它是ElementTree工具包的一部分.界面很简单.

这是一个例子:

from elementtree.SimpleXMLWriter import XMLWriter
import sys

w = XMLWriter(sys.stdout)
html = w.start("html")

w.start("head")
w.element("title", "my document")
w.element("meta", name="generator", value="my application 1.0")
w.end()

w.start("body")
w.element("h1", "this is a heading")
w.element("p", "this is a paragraph")

w.start("p")
w.data("this is ")
w.element("b", "bold")
w.data(" and ")
w.element("i", "italic")
w.data(".")
w.end("p")

w.close(html)
Run Code Online (Sandbox Code Playgroud)


Eli*_*ght 10

我假设你实际上是在创建一个XML DOM树,因为你想要验证进入这个文件的是有效的XML,因为否则你只需要在文件中写一个静态字符串.如果确认你的输出确实是你的目标,那么我建议

from xml.dom.minidom import parseString

doc = parseString("""<html>
    <head>
        <script type="text/javascript">
            var a = 'I love &amp;aacute; letters'
        </script>
    </head>
    <body>
        <h1>And I like the fact that 3 &gt; 1</h1>
    </body>
    </html>""")

with open("foo.xhtml", "w") as f:
    f.write( doc.toxml() )
Run Code Online (Sandbox Code Playgroud)

这使您可以只编写要输出的XML,验证它是否正确(因为如果它无效,parseString将引发异常)并让您的代码看起来更好.

据推测,您不仅每次都编写相同的静态XML,而且还需要进行一些替换.在这种情况下,我会有类似的行

var a = '%(message)s'
Run Code Online (Sandbox Code Playgroud)

然后使用%运算符进行替换,例如

</html>""" % {"message": "I love &amp;aacute; letters"})
Run Code Online (Sandbox Code Playgroud)


Mik*_*bov 7

https://github.com/galvez/xmlwitch:

import xmlwitch
xml = xmlwitch.Builder(version='1.0', encoding='utf-8')
with xml.feed(xmlns='http://www.w3.org/2005/Atom'):
    xml.title('Example Feed')
    xml.updated('2003-12-13T18:30:02Z')
    with xml.author:
        xml.name('John Doe')
    xml.id('urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6')
    with xml.entry:
        xml.title('Atom-Powered Robots Run Amok')
        xml.id('urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a')
        xml.updated('2003-12-13T18:30:02Z')
        xml.summary('Some text.')
print(xml)
Run Code Online (Sandbox Code Playgroud)