构建 XML 文档结构图

int*_*ted 6 python xml lxml graph dotfiles

我想构建一个图表,显示哪些标签被用作给定 XML 文档中哪些其他标签的子标签。

我编写了这个函数来获取 lxml.etree 树中给定标签的唯一子标签集:

def iter_unique_child_tags(root, tag):
    """Iterates through unique child tags for all instances of tag.

    Iteration starts at `root`.
    """
    found_child_tags = set()
    instances = root.iterdescendants(tag)
    from itertools import chain
    child_nodes = chain.from_iterable(i.getchildren() for i in instances)
    child_tags = (n.tag for n in child_nodes)
    for t in child_tags:
        if t not in found_child_tags:
            found_child_tags.add(t)
            yield t
Run Code Online (Sandbox Code Playgroud)

是否有通用图形构建器可以与此函数一起使用以构建点文件或其他格式的图形?

我还暗中怀疑某处有专门为此目的设计的工具;那可能是什么?

int*_*ted 3

我最终使用了python-graph。我还最终使用argparse构建了一个命令行界面,该界面从 XML 文档中提取一些基本信息,并以pydot支持的格式构建图形图像。它称为xmlearn,很有用:

usage: xmlearn [-h] [-i INFILE] [-p PATH] {graph,dump,tags} ...

optional arguments:
  -h, --help            show this help message and exit
  -i INFILE, --infile INFILE
                        The XML file to learn about. Defaults to stdin.
  -p PATH, --path PATH  An XPath to be applied to various actions.
                        Defaults to the root node.

subcommands:
  {graph,dump,tags}
    dump                Dump xml data according to a set of rules.
    tags                Show information about tags.
    graph               Build a graph from the XML tags relationships.
Run Code Online (Sandbox Code Playgroud)