使用Python和正则表达式，如何从html中删除<sup>标签？

Question

使用Python和正则表达式，如何从html中删除<sup>标签？

使用 python 正则表达式，如何删除所有^{html 中的标签？标签有时具有样式，如下所示：}

<sup style="vertical-align:top;line-height:120%;font-size:7pt">(1)</sup>

Run Code Online (Sandbox Code Playgroud)

我想删除一个较大的 html 字符串中的sup 标签之间的所有内容（包括sup 标签）。

Answer 1

ale*_*cxe 6

我会使用 HTML 解析器（为什么）。例如，BeautifulSoup可以unwrap()处理你美丽的sup：

\n\n

\n
Tag.unwrap() 与wrap() 相反。它将标签替换为该标签内的任何\xe2\x80\x99s。它\xe2\x80\x99s 非常适合剥离标记。
\n

\n\n

from bs4 import BeautifulSoup\n\ndata = """\n<div>\n    <sup style="vertical-align:top;line-height:120%;font-size:7pt">(1)</sup>\n</div>\n"""\n\nsoup = BeautifulSoup(data)\nfor sup in soup.find_all(\'sup\'):\n    sup.unwrap()\n\nprint soup.prettify()\n

Run Code Online (Sandbox Code Playgroud)\n\n

印刷：

\n\n

<div>\n(1)\n</div>\n

Run Code Online (Sandbox Code Playgroud)\n

归档时间：	11 年，5 月前
查看次数：	1515 次
最近记录：	11 年，5 月前

使用Python和正则表达式，如何从html中删除&lt;sup&gt;标签？

使用Python和正则表达式，如何从html中删除<sup>标签？