ElementTree:Element.remove()跳跃迭代

Sar*_*ara 3 python iteration elementtree xml-parsing

我有这个xml输入文件:

<?xml version="1.0"?>
<zero>
  <First>
    <second>
      <third-num>1</third-num>
      <third-def>object001</third-def>
      <third-len>458</third-len>
    </second>
    <second>
      <third-num>2</third-num>
      <third-def>object002</third-def>
      <third-len>426</third-len>
    </second>
    <second>
      <third-num>3</third-num>
      <third-def>object003</third-def>
      <third-len>998</third-len>
    </second>
  </First>
</zero>
Run Code Online (Sandbox Code Playgroud)

我的目标是删除任何<third-def>不是值的第二级.为此,我写了这段代码:

try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET
inputfile='inputfile.xml'
tree = ET.parse(inputfile)
root = tree.getroot()

elem = tree.find('First')
for elem2 in tree.iter(tag='second'):
    if elem2.find('third-def').text == 'object001':
        pass
    else:
        elem.remove(elem2)
        #elem2.clear()
Run Code Online (Sandbox Code Playgroud)

我的问题是elem.remove(elem2).它会跳过其他所有第二级.以下是此代码的输出:

<?xml version="1.0" ?>
<zero>
  <First>
    <second>
      <third-num>1</third-num>
      <third-def>object001</third-def>
      <third-len>458</third-len>
    </second>
    <second>
      <third-num>3</third-num>
      <third-def>object003</third-def>
      <third-len>998</third-len>
    </second>
  </First>
</zero>
Run Code Online (Sandbox Code Playgroud)

现在,如果我取消注释该elem2.clear()行,脚本可以正常工作,但输出不太好,因为它保留了所有删除的第二级:

<?xml version="1.0" ?>
<zero>
  <First>
    <second>
      <third-num>1</third-num>
      <third-def>object001</third-def>
      <third-len>458</third-len>
    </second>
    <second/>
    <second/>
  </First>
</zero>
Run Code Online (Sandbox Code Playgroud)

有没有人知道为什么我的element.remove()陈述是错的?

Mar*_*ers 7

你正在循环实况树:

for elem2 in tree.iter(tag='second'):
Run Code Online (Sandbox Code Playgroud)

然后在迭代时更改.该迭代的"计数器"将不被告知更改的一些元素,所以寻找元素0和移除元素时,迭代器然后移动到单元号1.但是,什么现在单元号1单元号0.

首先捕获所有元素的列表,然后循环遍历:

for elem2 in tree.findall('.//second'):
Run Code Online (Sandbox Code Playgroud)

.findall() 返回结果列表,在您更改树时不会更新.

现在迭代不会跳过最后一个元素:

>>> print ET.tostring(tree)
<zero>
  <First>
    <second>
      <third-num>1</third-num>
      <third-def>object001</third-def>
      <third-len>458</third-len>
    </second>
    </First>
</zero>
Run Code Online (Sandbox Code Playgroud)

这种现象不仅限于ElementTree树; 请参阅循环"忘记"以删除某些项目