使用ElementTree和Python覆盖XML文件时,保留现有命名空间

Sac*_*nth 5 python xml elementtree

我有以下格式的XML文件

<?xml version="1.0" encoding="utf-8"?>
<foo>
   <bar>
      <bat>1</bat>
   </bar>
   <a>
      <b xmlns="urn:schemas-microsoft-com:asm.v1">
         <c>1</c>
      </b>
   </a>
</foo>
Run Code Online (Sandbox Code Playgroud)

我想将bat的值更改为“ 2”,并将文件更改为此:

<?xml version="1.0" encoding="utf-8"?>
<foo>
   <bar>
      <bat>2</bat>
   </bar>
   <a>
      <b xmlns="urn:schemas-microsoft-com:asm.v1">
         <c>1</c>
      </b>
   </a>
</foo>
Run Code Online (Sandbox Code Playgroud)

我这样做来打开这个文件

tree = ET.parse(filePath)
root = tree.getroot()
Run Code Online (Sandbox Code Playgroud)

然后,将bat的值更改为'2'并保存如下文件:

tree.write(filePath, "utf-8", True, None, "xml")
Run Code Online (Sandbox Code Playgroud)

bat的值成功更改为2,但是XML文件现在看起来像这样。

<?xml version="1.0" encoding="utf-8"?>
<foo xmlns:ns0="urn:schemas-microsoft-com:asm.v1">
   <bar>
      <bat>2</bat>
   </bar>
   <a>
      <ns0:b>
         <ns0:c>1</ns0:c>
      </ns0:b>
   </a>
</foo>
Run Code Online (Sandbox Code Playgroud)

为了解决拥有名为ns0的命名空间的问题,在解析文档之前,请执行以下操作

ET.register_namespace('', "urn:schemas-microsoft-com:asm.v1")
Run Code Online (Sandbox Code Playgroud)

这摆脱了ns0 namepace,但是xml文件现在看起来像这样

<?xml version="1.0" encoding="utf-8"?>
<foo xmlns="urn:schemas-microsoft-com:asm.v1">
   <bar>
      <bat>2</bat>
   </bar>
   <a>
      <b>
         <c>1</c>
      </b>
   </a>
</foo>
Run Code Online (Sandbox Code Playgroud)

我该怎么做才能获得所需的输出?

Gio*_*ova 1

据我所知,没有办法通过xml.etree.ElementTree方法来实现你的目标。通过深入xml.etree研究源代码和xml规范,我发现库的行为没有错误,也没有不合理。无论如何,它不允许您正在寻找的输出。

要使用该库实现您的目标,您必须自定义渲染行为。为了最好地满足您的需求,我编写了以下render函数。

from xml.etree import ElementTree as ET
from re import findall, sub

def render(root, buffer='', namespaces=None, level=0, indent_size=2, encoding='utf-8'):
    buffer += f'<?xml version="1.0" encoding="{encoding}" ?>\n' if not level else ''
    root = root.getroot() if isinstance(root, ET.ElementTree) else root
    _, namespaces = ET._namespaces(root) if not level else (None, namespaces)
    for element in root.iter():
        indent = ' ' * indent_size * level
        tag = sub(r'({[^}]+}\s*)*', '', element.tag)
        buffer += f'{indent}<{tag}'
        for ns in findall(r'{[^}]+}', element.tag):
            ns_key = ns[1:-1]
            if ns_key not in namespaces: continue
            buffer += ' xmlns' + (f':{namespaces[ns_key]}' if namespaces[ns_key] != '' else '') + f'="{ns_key}"'
            del namespaces[ns_key]
        for k, v in element.attrib.items():
            buffer += f' {k}="{v}"'
        buffer += '>' + element.text.strip() if element.text else '>'
        children = list(element)
        for child in children:
            sep = '\n' if buffer[-1] != '\n' else ''
            buffer += sep + render(child, level=level+1, indent_size=indent_size, namespaces=namespaces)
        buffer += f'{indent}</{tag}>\n' if 0 != len(children) else f'</{tag}>\n'
    return buffer
Run Code Online (Sandbox Code Playgroud)

通过向上述render()函数提供您的xml输入数据,如下所示:

data =\ 
'''<?xml version="1.0" encoding="utf-8"?>
<foo>
   <bar>
      <bat>1</bat>
   </bar>
   <a>
      <b xmlns="urn:schemas-microsoft-com:asm.v1">
         <c>1</c>
      </b>
   </a>
</foo>'''

root = ET.ElementTree(ET.fromstring(data))
ET.register_namespace('', "urn:schemas-microsoft-com:asm.v1")
print(render(root))
Run Code Online (Sandbox Code Playgroud)

它打印出您正在寻找的输出:

<?xml version="1.0" encoding="utf-8" ?>
<foo>
  <bar>
    <bat>1</bat>
  </bar>
  <a>
    <b xmlns="urn:schemas-microsoft-com:asm.v1">
      <c>1</c>
    </b>
  </a>
</foo>
Run Code Online (Sandbox Code Playgroud)