“str”对象没有属性“find_all”漂亮的汤

Question

“str”对象没有属性“find_all”漂亮的汤

这就是我的代码，非常简单。由于某种原因，出现上述错误。即使我删除text = str(html)并替换soup = BeautifulSoup(text, 'html.parser')为同样的错误soup = BeautifulSoup(html, 'html.parser')。这是怎么回事？

with urllib.request.urlopen('https://jalopnik.com/search?q=mazda&u=&zo=-07:00') as response:
   html = response.read()  
text = str(html)  
soup = BeautifulSoup(text, 'html.parser')
print(type(soup))
soup = soup.prettify()
print(soup.find_all('div'))

Run Code Online (Sandbox Code Playgroud)

Answer 1

Mar*_*ers 8

soup = soup.prettify()返回一个字符串，并且因为您将其分配回soup，所以soup当您调用时会生成一个字符串soup.find_all()。

来自BeautifulSoup 文档的 漂亮打印部分：

该prettify()方法会将 Beautiful Soup 解析树转换为格式良好的 Unicode 字符串。

不要用经过修饰的绳子代替汤。BeautifulSoup不需要美化，只有当你想将汤转回字符串以保存到文件或调试时才需要。

soup = BeautifulSoup(text, 'html.parser')
print(soup.find_all('div'))

Run Code Online (Sandbox Code Playgroud)

工作得很好。

您也不想使用该对象str(html)来解码bytes。通常你会使用html.decode('utf8')或类似的；str(html)给你一个以开头b'和结尾的值'

然而，BeautifulSoup 完全能够自行解码字节值。它还可以直接从响应中读取：

with urllib.request.urlopen('https://jalopnik.com/search?q=mazda&u=&zo=-07:00') as response:
    soup = BeautifulSoup(response, 'html.parser')
print(soup.find_all('div'))

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，3 月前
查看次数：	13731 次
最近记录：	7 年，3 月前