获取所有带有Beautiful Soup的HTML标签

Question

获取所有带有Beautiful Soup的HTML标签

我想从美丽的汤中获取所有html标签的列表.

我看到了所有但我必须在搜索之前知道标签的名称.

如果有文字就好

html = """<div>something</div>
<div>something else</div>
<div class='magical'>hi there</div>
<p>ok</p>"""

Run Code Online (Sandbox Code Playgroud)

我怎样才能得到像这样的清单

list_of_tags = ["<div>", "<div>", "<div class='magical'>", "<p>"]

Run Code Online (Sandbox Code Playgroud)

我知道如何用正则表达式做到这一点,但我正在尝试学习BS4

Answer 1

ale*_*cxe 26

您不必指定任何参数find_all()- 在这种情况下,BeautifulSoup将递归地找到树中的每个标记.样品:

>>> from bs4 import BeautifulSoup
>>>
>>> html = """<div>something</div>
... <div>something else</div>
... <div class='magical'>hi there</div>
... <p>ok</p>"""
>>> soup = BeautifulSoup(html, "html.parser")
>>> [tag.name for tag in soup.find_all()]
[u'div', u'div', u'div', u'p']
>>> [str(tag) for tag in soup.find_all()]
['<div>something</div>', '<div>something else</div>', '<div class="magical">hi there</div>', '<p>ok</p>']

Run Code Online (Sandbox Code Playgroud)

Answer 2

小智 7

请尝试以下方法——

for tag in soup.findAll(True):
    print(tag.name)

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，8 月前
查看次数：	24181 次
最近记录：	9 年，8 月前