如何使用ElementTree获取元素的完整XML或HTML内容?

pup*_*eno 10 python xml api elementtree

也就是说,所有文本和子标签,没有元素本身的标签?

<p>blah <b>bleh</b> blih</p>
Run Code Online (Sandbox Code Playgroud)

我想要

blah <b>bleh</b> blih
Run Code Online (Sandbox Code Playgroud)

element.text返回"blah",etree.tostring(element)返回:

<p>blah <b>bleh</b> blih</p>
Run Code Online (Sandbox Code Playgroud)

S.L*_*ott 11

ElementTree的作品完美,你得自己装配的答案.像这样......

"".join( [ "" if t.text is None else t.text ] + [ xml.tostring(e) for e in t.getchildren() ] )
Run Code Online (Sandbox Code Playgroud)

感谢JV amd PEZ指出错误.


编辑.

>>> import xml.etree.ElementTree as xml
>>> s= '<p>blah <b>bleh</b> blih</p>\n'
>>> t=xml.fromstring(s)
>>> "".join( [ t.text ] + [ xml.tostring(e) for e in t.getchildren() ] )
'blah <b>bleh</b> blih'
>>> 
Run Code Online (Sandbox Code Playgroud)

不需要尾巴.


pup*_*eno 6

这是我最终使用的解决方案:

def element_to_string(element):
    s = element.text or ""
    for sub_element in element:
        s += etree.tostring(sub_element)
    s += element.tail
    return s
Run Code Online (Sandbox Code Playgroud)