我有一个类似于以下的html文档:
<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml">
<div id="Symbols" class="cb">
<table class="quotes">
<tr><th>Code</th><th>Name</th>
<th style="text-align:right;">High</th>
<th style="text-align:right;">Low</th>
</tr>
<tr class="ro" onclick="location.href='/xyz.com/A.htm';" style="color:red;">
<td><a href="/xyz.com/A.htm" title="Display,A">A</a></td>
<td>A Inc.</td>
<td align="right">45.44</td>
<td align="right">44.26</td>
<tr class="re" onclick="location.href='/xyz.com/B.htm';" style="color:red;">
<td><a href="/xyz.com/B.htm" title="Display,B">B</a></td>
<td>B Inc.</td>
<td align="right">18.29</td>
<td align="right">17.92</td>
</div></html>
Run Code Online (Sandbox Code Playgroud)
我需要code/name/high/low从表中提取信息.
我使用了Stack Over Flow中类似示例中的以下代码:
#############################
import urllib2
from lxml import html, etree
webpg = urllib2.urlopen(http://www.eoddata.com/stocklist/NYSE/A.htm).read()
table = html.fromstring(webpg)
for row in table.xpath('//table[@class="quotes"]/tbody/tr'):
for column in row.xpath('./th[position()>0]/text() | ./td[position()=1]/a/text() | ./td[position()>1]/text()'):
print column.strip(),
print
#############################
Run Code Online (Sandbox Code Playgroud)
我没有得到任何输出.我必须将第一个循环xpath更改table.xpath('//tr') …
我正在尝试从网页打印/保存某个元素的HTML.
我从firebug中检索了所请求的元素的XPath.
我希望将此元素保存到文件中.我似乎没有成功.
(尝试使用和不使用/text()最终的XPath )
我将不胜感激任何帮助或过去的经验.
10x,大卫
import urllib2,StringIO
from lxml import etree
url='http://www.tutiempo.net/en/Climate/Londres_Heathrow_Airport/12-2009/37720.htm'
seite = urllib2.urlopen(url)
html = seite.read()
seite.close()
parser = etree.HTMLParser()
tree = etree.parse(StringIO.StringIO(html), parser)
xpath = "/html/body/table/tbody/tr/td[2]/div/table/tbody/tr[6]/td/table/tbody/tr/td[3]/table/tbody/tr[3]/td/table/tbody/tr/td/table/tbody/tr/td/table/tbody/text()"
elem = tree.xpath(xpath)
print elem[0].strip().encode("utf-8")
Run Code Online (Sandbox Code Playgroud)