>>> from lxml.etree import HTML, tostring
>>> tostring(HTML('<fb:like>'))
'<html><body><like/></body></html>'
Run Code Online (Sandbox Code Playgroud)
注意标签如何变为<fb:like>简单<like>.
这使得处理包含XFBML和lxml的页面变得更加困难.(同样的事情发生<g:plusone></g:plusone>)
任何帮助表示赞赏.
我编写了一个脚本,以xml格式打印出当前目录中的所有.xml文件,但我无法弄清楚如何将xmlns属性添加到顶级标记.我想得到的输出是:
<?xml version='1.0' encoding='utf-8'?>
<databaseChangeLog
xmlns="http://www.host.org/xml/ns/dbchangelog"
xmlns:xsi="http://www.host.org/2001/XMLSchema-instance"
xsi:schemaLocation="www.host.org/xml/ns/dbchangelog">
<include file="cats.xml"/>
<include file="dogs.xml"/>
<include file="fish.xml"/>
<include file="meerkats.xml"/>
</databaseChangLog>
Run Code Online (Sandbox Code Playgroud)
但是,这是我得到的输出:
<?xml version='1.0' encoding='utf-8'?>
<databaseChangeLog>
<include file="cats.xml"/>
<include file="dogs.xml"/>
<include file="fish.xml"/>
<include file="meerkats.xml"/>
</databaseChangLog>
Run Code Online (Sandbox Code Playgroud)
这是我的脚本:
import lxml.etree
import lxml.builder
import glob
E = lxml.builder.ElementMaker()
ROOT = E.databaseChangeLog
DOC = E.include
# grab all the xml files
files = [DOC(file=f) for f in glob.glob("*.xml")]
the_doc = ROOT(*files)
str = lxml.etree.tostring(the_doc, pretty_print=True, xml_declaration=True, encoding='utf-8')
print str
Run Code Online (Sandbox Code Playgroud)
我在网上找到了一些显式设置命名空间属性的例子,这里和这里,但说实话,我刚刚开始时,他们有点过头了.有没有其他方法将这些xmlns属性添加到databaseChangeLog标记?
我需要得到这个xml:
<s:Envelope xmlns:a="http://www.w3.org/2005/08/addressing" xmlns:s="http://www.w3.or/2003/05/soap-envelope">
<s:Header>
<a:Action s:mustUnderstand="1">Action</a:Action>
</s:Header>
</s:Envelope>
Run Code Online (Sandbox Code Playgroud)
据我所知<Action>节点,它的属性"mustUnderstand"在不同的名称空间下.我现在取得的成就:
from lxml.etree import Element, SubElement, QName, tostring
class XMLNamespaces:
s = 'http://www.w3.org/2003/05/soap-envelope'
a = 'http://www.w3.org/2005/08/addressing'
root = Element(QName(XMLNamespaces.s, 'Envelope'), nsmap={'s':XMLNamespaces.s, 'a':XMLNamespaces.a})
header = SubElement(root, QName(XMLNamespaces.s, 'Header'))
action = SubElement(header, QName(XMLNamespaces.a, 'Action'))
action.attrib['mustUnderstand'] = "1"
action.text = 'Action'
print tostring(root, pretty_print=True)
Run Code Online (Sandbox Code Playgroud)
结果:
<s:Envelope xmlns:a="http://www.w3.org/2005/08/addressing" xmlns:s="http://www.w3.org/2003/05/soap-envelope">
<s:Header>
<a:Action mustUnderstand="1">http://schemas.xmlsoap.org/ws/2004/09/transfer/Create</a:Action>
</s:Header>
</s:Envelope>
Run Code Online (Sandbox Code Playgroud)
我们可以看到,"mustUnderstand"属性前面没有名称空间前缀.那么有可能用lxml 获得" s: mustUnderstand"吗?如果有,那怎么样?