我需要使用python的lxml基于属性的内容完全删除元素.例:
import lxml.etree as et
xml="""
<groceries>
<fruit state="rotten">apple</fruit>
<fruit state="fresh">pear</fruit>
<fruit state="fresh">starfruit</fruit>
<fruit state="rotten">mango</fruit>
<fruit state="fresh">peach</fruit>
</groceries>
"""
tree=et.fromstring(xml)
for bad in tree.xpath("//fruit[@state=\'rotten\']"):
#remove this element from the tree
print et.tostring(tree, pretty_print=True)
Run Code Online (Sandbox Code Playgroud)
我想要打印:
<groceries>
<fruit state="fresh">pear</fruit>
<fruit state="fresh">starfruit</fruit>
<fruit state="fresh">peach</fruit>
</groceries>
Run Code Online (Sandbox Code Playgroud)
有没有办法在不存储临时变量并手动打印的情况下执行此操作,如下所示:
newxml="<groceries>\n"
for elt in tree.xpath('//fruit[@state=\'fresh\']'):
newxml+=et.tostring(elt)
newxml+="</groceries>"
Run Code Online (Sandbox Code Playgroud) 我如何在XPath 1.0中找到所有空行col name="POW"?
<row>
<col name="WOJ">02</col>
<col name="POW"/>
<col name="GMI"/>
<col name="RODZ"/>
<col name="NAZWA">DOLNO?L?SKIE</col>
<col name="NAZDOD">województwo</col>
<col name="STAN_NA">2011-01-01</col>
</row>
Run Code Online (Sandbox Code Playgroud)
我试过很多解决方案.在Firefox扩展XPath Checker选择中很少次,但lxml.xpath()表示表达式无效或只返回没有行.
我的Python代码:
from lxml import html
f = open('TERC.xml', 'r')
page = html.fromstring(f.read())
for r in page.xpath("//row[col[@name = 'POW' and not(text())]]"):
print r.text_content()
print "-------------------------"
Run Code Online (Sandbox Code Playgroud)