使用urllib和BeautifulSoup通过Python从Web检索信息

pro*_*eek 10 python urllib2 beautifulsoup web-scraping

我可以使用urllib获取html页面,并使用BeautifulSoup来解析html页面,看起来我必须生成要从BeautifulSoup读取的文件.

import urllib                                       
sock = urllib.urlopen("http://SOMEWHERE") 
htmlSource = sock.read()                            
sock.close()                                        
--> write to file
Run Code Online (Sandbox Code Playgroud)

有没有办法调用BeautifulSoup而不从urllib生成文件?

int*_*jay 20

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(htmlSource)
Run Code Online (Sandbox Code Playgroud)

无需编写文件:只需传入HTML字符串即可.您也可以urlopen直接传递返回的对象:

f = urllib.urlopen("http://SOMEWHERE") 
soup = BeautifulSoup(f)
Run Code Online (Sandbox Code Playgroud)