spe*_*zor 15 python xml lxml namespaces
我有一个xml文件,我需要打开并进行一些更改,其中一个更改是删除命名空间和前缀,然后保存到另一个文件.这是xml:
<?xml version='1.0' encoding='UTF-8'?>
<package xmlns="http://apple.com/itunes/importer">
<provider>some data</provider>
<language>en-GB</language>
</package>
Run Code Online (Sandbox Code Playgroud)
我可以进行其他需要的更改,但无法找到如何删除命名空间和前缀.这是我需要的reusklt xml:
<?xml version='1.0' encoding='UTF-8'?>
<package>
<provider>some data</provider>
<language>en-GB</language>
</package>
Run Code Online (Sandbox Code Playgroud)
这是我的脚本,它将打开并解析xml并保存它:
metadata = '/Users/user1/Desktop/Python/metadata.xml'
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
open(metadata)
tree = etree.parse(metadata, parser)
root = tree.getroot()
tree.write('/Users/user1/Desktop/Python/done.xml', pretty_print = True, xml_declaration = True, encoding = 'UTF-8')
Run Code Online (Sandbox Code Playgroud)
那么如何在我的脚本中添加代码来删除命名空间和前缀呢?
fal*_*tru 26
替换标签为Uku Loskit建议.除此之外,请使用lxml.objectify.deannotate.
from lxml import etree, objectify
metadata = '/Users/user1/Desktop/Python/metadata.xml'
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(metadata, parser)
root = tree.getroot()
####
for elem in root.getiterator():
if not hasattr(elem.tag, 'find'): continue # (1)
i = elem.tag.find('}')
if i >= 0:
elem.tag = elem.tag[i+1:]
objectify.deannotate(root, cleanup_namespaces=True)
####
tree.write('/Users/user1/Desktop/Python/done.xml',
pretty_print=True, xml_declaration=True, encoding='UTF-8')
Run Code Online (Sandbox Code Playgroud)
UPDATE
有些标签Comment在访问tag属性时会返回一个函数.为此加了一个警卫.(1)
Ser*_*kov 24
>>> root.tag
'{http://latest/nmc-omc/cmNrm.doc#measCollec}measCollecFile'
>>> etree.QName(root.tag).localname
'measCollecFile'
Run Code Online (Sandbox Code Playgroud)
附录:lxml.etree.QName也接受有关建设的要素.所以etree.QName(root.tag).localname相当于:
etree.QName(root).localname
Run Code Online (Sandbox Code Playgroud)
您还可以使用 XSLT 来剥离名称空间...
XSLT 1.0(测试.xsl)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*" priority="1">
<xsl:element name="{local-name()}" namespace="">
<xsl:apply-templates select="@*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="@*">
<xsl:attribute name="{local-name()}" namespace="">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
Run Code Online (Sandbox Code Playgroud)
Python
from lxml import etree
tree = etree.parse("metadata.xml")
xslt = etree.parse("test.xsl")
new_tree = tree.xslt(xslt)
print(etree.tostring(new_tree, pretty_print=True, xml_declaration=True,
encoding="UTF-8").decode("UTF-8"))
Run Code Online (Sandbox Code Playgroud)
输出
<?xml version='1.0' encoding='UTF-8'?>
<package>
<provider>some data</provider>
<language>en-GB</language>
</package>
Run Code Online (Sandbox Code Playgroud)
import xml.etree.ElementTree as ET
def remove_namespace(doc, namespace):
"""Remove namespace in the passed document in place."""
ns = u'{%s}' % namespace
nsl = len(ns)
for elem in doc.getiterator():
if elem.tag.startswith(ns):
elem.tag = elem.tag[nsl:]
metadata = '/Users/user1/Desktop/Python/metadata.xml'
tree = ET.parse(metadata)
root = tree.getroot()
remove_namespace(root, u'http://apple.com/itunes/importer')
tree.write('/Users/user1/Desktop/Python/done.xml',
pretty_print=True, xml_declaration=True, encoding='UTF-8')
Run Code Online (Sandbox Code Playgroud)
使用这里的一段代码 通过搜索以“xmlns”开头的标签,可以轻松扩展此方法以删除任何名称空间属性
| 归档时间: |
|
| 查看次数: |
33435 次 |
| 最近记录: |