使用urllib和BeautifulSoup通过Python从Web检索信息

Question

使用urllib和BeautifulSoup通过Python从Web检索信息

pro*_*eek 10 python urllib2 beautifulsoup web-scraping

我可以使用urllib获取html页面,并使用BeautifulSoup来解析html页面,看起来我必须生成要从BeautifulSoup读取的文件.

import urllib                                       
sock = urllib.urlopen("http://SOMEWHERE") 
htmlSource = sock.read()                            
sock.close()                                        
--> write to file

Run Code Online (Sandbox Code Playgroud)

有没有办法调用BeautifulSoup而不从urllib生成文件？

Answer 1

int*_*jay 20

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(htmlSource)

Run Code Online (Sandbox Code Playgroud)

无需编写文件:只需传入HTML字符串即可.您也可以urlopen直接传递返回的对象:

f = urllib.urlopen("http://SOMEWHERE") 
soup = BeautifulSoup(f)

Run Code Online (Sandbox Code Playgroud)

归档时间：	15 年，4 月前
查看次数：	21024 次
最近记录：	8 年前