使用 BeautifulSoup 提取标题

Question

使用 BeautifulSoup 提取标题

我有这个

from urllib import request
url = "http://www.bbc.co.uk/news/election-us-2016-35791008"
html = request.urlopen(url).read().decode('utf8')
html[:60]

from bs4 import BeautifulSoup
raw = BeautifulSoup(html, 'html.parser').get_text()
raw.find_all('title', limit=1)
print (raw.find_all("title"))
'<!doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN'

Run Code Online (Sandbox Code Playgroud)

我想使用 BeautifulSoup 提取页面的标题但收到此错误

Traceback (most recent call last):
  File "C:\Users\Passanova\AppData\Local\Programs\Python\Python35-32\test.py", line 8, in <module>
    raw.find_all('title', limit=1)
AttributeError: 'str' object has no attribute 'find_all'

Run Code Online (Sandbox Code Playgroud)

请任何建议

Answer 1

SLe*_*ort 21

要导航汤，您需要一个 BeautifulSoup 对象，而不是一个字符串。所以取消你get_text()对汤的呼唤。

此外，您可以替换raw.find_all('title', limit=1)与find('title')它等效。

尝试这个：

from urllib import request
url = "http://www.bbc.co.uk/news/election-us-2016-35791008"
html = request.urlopen(url).read().decode('utf8')
html[:60]

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
title = soup.find('title')

print(title) # Prints the tag
print(title.string) # Prints the tag string content

Run Code Online (Sandbox Code Playgroud)

Answer 2

小智 7

你可以直接使用“soup.title”而不是“soup.find_all('title', limit=1)”或“soup.find('title')”，它会给你标题。

from urllib import request
url = "http://www.bbc.co.uk/news/election-us-2016-35791008"
html = request.urlopen(url).read().decode('utf8')
html[:60]

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
title = soup.title
print(title)
print(title.string)

Run Code Online (Sandbox Code Playgroud)

Answer 3

Ali*_*jad 5

就这么简单：

soup = BeautifulSoup(htmlString, 'html.parser')
title = soup.title.text

Run Code Online (Sandbox Code Playgroud)

这里，soup.title返回一个BeautifulSoup 元素，它是 title 元素。

归档时间：	9 年，7 月前
查看次数：	32472 次
最近记录：	4 年，8 月前