我有这个HTML片段
<div id="dw__toc">
<h3 class="toggle">Table of Contents</h3>
<div>
<ul class="toc">
<li class="level1"><div class="li"><a href="#section">#</a></div>
<ul class="toc">
<li class="level2"><div class="li"><a href="#link1">One</a></div></li>
<li class="level2"><div class="li"><a href="#link2">Two</a></div></li>
<li class="level2"><div class="li"><a href="#link3">Three</a></div></li>
Run Code Online (Sandbox Code Playgroud)
现在我想用lxml.html解析它.最后我想要一个函数,我可以提供一个searchterm(即"one"),函数应该返回
One
#link1
Run Code Online (Sandbox Code Playgroud)
现在我想在XPath中获取一个变量.
作品:
import lxml.html
html = lxml.html.parse("www.myurl.com/slash/something")
test=html.xpath("//ul[@class='toc']/li[@class='level2']/div[@class='li']/a/text()='One'")
print test
Run Code Online (Sandbox Code Playgroud)
尝试变量.我想'One'用一个变量替换硬编码,我可以在以后返回该函数.
不起作用:
import lxml.html
html = lxml.html.parse("www.myurl.com/slash/something")
desiredvars = ['One']
myresultset=((var, html.xpath("//ul[@class='toc']/li[@class='level2']/div[@class='li']/a[text()='%s']"%(var))[0]) for var in desiredvars)
for each in myresultset:
print each
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, …Run Code Online (Sandbox Code Playgroud) css ×1
html ×1
javascript ×1
jquery ×1
leaflet ×1
lxml ×1
modal-dialog ×1
parsing ×1
python ×1
web-scraping ×1