使用 BeautifulSoup 查找 HTML 文件中的第一个标签

Question

使用 BeautifulSoup 查找 HTML 文件中的第一个标签

我有一组 HTML 文件，我想提取每个文件中的第一个标签。由于文件 don\xe2\x80\x99t 具有特定标记，该标记始终是文件中的第一个，因此我\xe2\x80\x99m 不知道如何执行此操作。

\n\n

例如，对于以下代码片段，第一个标签是<html>。

\n\n

<html>\n <head>\n    <title>\n     insert title here\n    </title>\n </head>\n</html>\n

Run Code Online (Sandbox Code Playgroud)\n\n

有什么办法可以用 BeautifulSoup （或者可能是其他工具）来完成这个任务？提前致谢：）

\n

Answer 1

ale*_*cxe 6

在这种情况下，您可以使用BeautifulSoup，只需在对象find()上发出BeautifulSoup- 它会找到树中的第一个元素。.name会给你标签名称：

from bs4 import BeautifulSoup

data = """
<html>
 <head>
    <title>
     insert title here
    </title>
 </head>
</html>
"""

soup = BeautifulSoup(data, "html.parser")
print(soup.find().name)

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，5 月前
查看次数：	12490 次
最近记录：	9 年，5 月前