如何在python中使用Selenium和Beautifulsoup解析网站？

Question

如何在python中使用Selenium和Beautifulsoup解析网站？

twi*_*fee 39 python selenium beautifulsoup

编程新手并想出如何使用Selenium导航到我需要去的地方.我想现在解析数据,但不知道从哪里开始.有人能握住我的手一秒钟并指出我正确的方向吗？

任何帮助表示赞赏

Answer 1

假设您在要解析的页面上,Selenium将源HTML存储在驱动程序的page_source属性中.然后,您将加载page_source到BeautifulSoup如下:

In [8]: from bs4 import BeautifulSoup

In [9]: from selenium import webdriver

In [10]: driver = webdriver.Firefox()

In [11]: driver.get('http://news.ycombinator.com')

In [12]: html = driver.page_source

In [13]: soup = BeautifulSoup(html)

In [14]: for tag in soup.find_all('title'):
   ....:     print tag.text
   ....:     
   ....:     
Hacker News

Run Code Online (Sandbox Code Playgroud)

Answer 2

roo*_*oot 15

由于你的问题不是特别具体,这是一个简单的例子.要做更有用的事情,请阅读BS 文档.您还可以在SO中找到大量的硒(和BS)用法示例.

from selenium import webdriver
from bs4 import BeautifulSoup

browser=webdriver.Firefox()
browser.get('http://webpage.com')

soup=BeautifulSoup(browser.page_source)

#do something useful
#prints all the links with corresponding text

for link in soup.find_all('a'):
    print link.get('href',None),link.get_text()

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，2 月前
查看次数：	62032 次
最近记录：	13 年，2 月前