我试图解析此页面的评论:http://www.amazon.co.uk/product-reviews/B00143ZBHY
使用以下方法:
码
html # a variable which contains exact html as given at the above page.
from lxml import etree
tree = etree.HTML(html)
r = tree.xpath(".//*[@id='productReviews']/tbody/tr/td[1]/div[9]/text()[4]")
print len(r)
print r[0].tag
Run Code Online (Sandbox Code Playgroud)
产量
0
Traceback (most recent call last):
File "c.py", line 37, in <module>
print r[0].tag
IndexError: list index out of range
Run Code Online (Sandbox Code Playgroud)
p,s,:在firefox的xpath checker插件上使用相同的xpath时我很容易就可以了.但这里没有结果,请帮忙!
尝试删除/tbody形式的XPath -有没有<tbody>在#productReviews.
import urllib2
html = urllib2.urlopen("http://www.amazon.co.uk/product-reviews/B00143ZBHY").read()
from lxml import etree
tree = etree.HTML(html)
r = tree.xpath(".//*[@id='productReviews']/tr/td[1]/div[9]/text()[4]")
print r[0]
Run Code Online (Sandbox Code Playgroud)
输出:
bought this as replacement for the original cover which came with my greenhouse and which ripped in the wind. so far this seems a good replacement although for some reason it seems slightly too small for my greenhouse so that i cant zip both sides of the front at the same time. seems sturdier and thicker than the cover i had before so hoping it lasts a bit longer!
Run Code Online (Sandbox Code Playgroud)