xml.etree.ElementTree获取节点深度

tra*_*lad 10 python xml xml-parsing

XML:

<?xml version="1.0"?>
<pages>
    <page>
        <url>http://example.com/Labs</url>
        <title>Labs</title>
        <subpages>
            <page>
                <url>http://example.com/Labs/Email</url>
                <title>Email</title>
                <subpages>
                    <page/>
                    <url>http://example.com/Labs/Email/How_to</url>
                    <title>How-To</title>
                </subpages>
            </page>
            <page>
                <url>http://example.com/Labs/Social</url>
                <title>Social</title>
            </page>
        </subpages>
    </page>
    <page>
        <url>http://example.com/Tests</url>
        <title>Tests</title>
        <subpages>
            <page>
                <url>http://example.com/Tests/Email</url>
                <title>Email</title>
                <subpages>
                    <page/>
                    <url>http://example.com/Tests/Email/How_to</url>
                    <title>How-To</title>
                </subpages>
            </page>
            <page>
                <url>http://example.com/Tests/Social</url>
                <title>Social</title>
            </page>
        </subpages>
    </page>
</pages>
Run Code Online (Sandbox Code Playgroud)

代码:

// rexml is the XML string read from a URL
from xml.etree import ElementTree as ET
tree = ET.fromstring(rexml)
for node in tree.iter('page'):
    for url in node.iterfind('url'):
        print url.text
    for title in node.iterfind('title'):
        print title.text.encode("utf-8")
    print '-' * 30
Run Code Online (Sandbox Code Playgroud)

输出:

http://example.com/article1
Article1
------------------------------
http://example.com/article1/subarticle1
SubArticle1
------------------------------
http://example.com/article2
Article2
------------------------------
http://example.com/article3
Article3
------------------------------
Run Code Online (Sandbox Code Playgroud)

Xml表示站点地图的树状结构.

我整天都在文档和谷歌上下,并且无法弄清楚热门来获得节点的深度.

我使用了儿童容器的计数,但这只适用于第一个父母,然后它打破了,因为我无法弄清楚如何重置.但这可能只是一个hackish想法.

所需的输出:

0
http://example.com/article1
Article1
------------------------------
1
http://example.com/article1/subarticle1
SubArticle1
------------------------------
0
http://example.com/article2
Article2
------------------------------
0
http://example.com/article3
Article3
------------------------------
Run Code Online (Sandbox Code Playgroud)

Rac*_*hit 6

import xml.etree.ElementTree as etree
tree = etree.ElementTree(etree.fromstring(rexml)) 
maxdepth = 0
def depth(elem, level): 
   """function to get the maxdepth"""
    global maxdepth
    if (level == maxdepth):
        maxdepth += 1
   # recursive call to function to get the depth
    for child in elem:
        depth(child, level + 1) 


depth(tree.getroot(), -1)
print(maxdepth)
Run Code Online (Sandbox Code Playgroud)

  • 始终建议至少添加所提供代码的最小功能描述,解释它如何回答问题。 (3认同)

max*_*zig 5

Python ElementTreeAPI提供了深度优先遍历XML树的迭代器 - 遗憾的是,这些迭代器不向调用者提供任何深度信息.

但是你可以编写一个深度优先迭代器,它也返回每个元素的深度信息:

import xml.etree.ElementTree as ET

def depth_iter(element, tag=None):
    stack = []
    stack.append(iter([element]))
    while stack:
        e = next(stack[-1], None)
        if e == None:
            stack.pop()
        else:
            stack.append(iter(e))
            if tag == None or e.tag == tag:
                yield (e, len(stack) - 1)
Run Code Online (Sandbox Code Playgroud)

请注意,这比通过父链接(使用时lxml)确定深度更有效- 即它是O(n)vs O(n log n)..


fal*_*tru 4

用过的lxml.html

import lxml.html

rexml = ...

def depth(node):
    d = 0
    while node is not None:
        d += 1
        node = node.getparent()
    return d

tree = lxml.html.fromstring(rexml)
for node in tree.iter('page'):
    print depth(node)
    for url in node.iterfind('url'):
        print url.text
    for title in node.iterfind('title'):
        print title.text.encode("utf-8")
    print '-' * 30
Run Code Online (Sandbox Code Playgroud)