删除Python中的HTML块

Question

删除Python中的HTML块

我想知道 Python 中是否有库或某种方法可以从 HTML 文档中提取元素。例如：

我有这个文件：

<html>
      <head>
          ...
      </head>
      <body>
          <div>
           ...
          </div>
      </body>
</html>

Run Code Online (Sandbox Code Playgroud)

我想<div></div>从文档中删除标签块以及块内容，然后它会像这样：

<html>
    <head>
     ...
    </head>
    <body>
    </body>
</html>

Run Code Online (Sandbox Code Playgroud)

Answer 1

Wso*_*Wso 7

为此，您不需要图书馆。只需使用内置的字符串方法。

def removeOneTag(text, tag):
    return text[:text.find("<"+tag+">")] + text[text.find("</"+tag+">") + len(tag)+3:]

Run Code Online (Sandbox Code Playgroud)

这将删除第一个开始标签和结束标签之间的所有内容。所以您在示例中的输入将类似于...

    x = """<html>
    <head>
      ...
    </head>
    <body>
       <div>
         ...
       </div>
    </body>
</html>"""
print(removeOneTag(x, "div"))

Run Code Online (Sandbox Code Playgroud)

然后，如果您想删除所有标签...

while(tag in x):
    x = removeOneTag(x, tag)

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，5 月前
查看次数：	4154 次
最近记录：	9 年，5 月前