Par*_*gue 0 python lxml screen-scraping screen web-scraping
我一直在努力编写过去几个小时的程序,我认为这是一个非常简单的任务:
我尝试过使用带有lxml的Xpath,但没有经验,每一个结构都带有一个空白数组.
报价的实际内容似乎包含在"sqq"类中.
如果我通过Firebug导航网站,单击DOM选项卡,它出现在textNode属性"wholeText"或"textContent"中 - 但我不知道如何以编程方式使用该知识.
有任何想法吗?
import lxml.html
import urllib
site = 'http://thinkexist.com/search/searchquotation.asp'
userInput = raw_input('Search for: ').strip()
url = site + '?' + urllib.urlencode({'search':userInput})
root = lxml.html.parse(url).getroot()
quotes = root.xpath('//a[@class="sqq"]')
print quotes[0].text_content()
Run Code Online (Sandbox Code Playgroud)
......如果你进入'莎士比亚',它就会回来
In real life, unlike in Shakespeare, the sweetness
of the rose depends upon the name it bears. Things
are not only what they are. They are, in very important
respects, what they seem to be.
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1651 次 |
| 最近记录: |