urlopen（'http .....'）。read（）中的read（）有什么作用？[urllib]

Question

urlopen（'http .....'）。read（）中的read（）有什么作用？[urllib]

嗨，我正在阅读“使用Python进行网页搜刮（2015）”。我看到了以下两种打开url的方式，无论是否使用.read()。看到bs1和bs2

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen('http://web.stanford.edu/~zlotnick/TextAsData/Web_Scraping_with_Beautiful_Soup.html')
bs1 = BeautifulSoup(html.read(), 'html.parser')

html = urlopen('http://web.stanford.edu/~zlotnick/TextAsData/Web_Scraping_with_Beautiful_Soup.html')
bs2 = BeautifulSoup(html, 'html.parser')

bs1 == bs2 # true


print(bs1.prettify()[0:100])
print(bs2.prettify()[0:100]) # prints same thing

Run Code Online (Sandbox Code Playgroud)

那有.read()多余吗？谢谢

使用python在Web scpraing的p7上的代码：（使用.read()）

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page1.html")
bsObj = BeautifulSoup(html.read())

Run Code Online (Sandbox Code Playgroud)

p15上的代码（无.read()）

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/warandpeace.html")
bsObj = BeautifulSoup(html)

Run Code Online (Sandbox Code Playgroud)

Answer 1

won*_*ng2 5

urllib.request.urlopen返回一个类似文件的对象，read它的方法将返回该 url 的响应正文。

BeautifulSoup构造函数接受字符串或打开的文件句柄，所以是的，read()这里是多余的。

Answer 2

Łuk*_*ski 5

引用BS 文档：

要解析文档，请将其传递给 BeautifulSoup 构造函数。您可以传入一个字符串或一个打开的文件句柄：

当您使用 .read() 方法时，您将使用“字符串”接口。如果不是，则使用“filehandle”接口。

实际上，它以相同的方式工作（尽管 BS4 可能会以惰性方式读取类文件对象）。在您的情况下，整个内容被读取到字符串对象（它可能会不必要地消耗更多内存）。

归档时间：	9 年，11 月前
查看次数：	5400 次
最近记录：	9 年，11 月前