用Python编写的HTML压缩器

Question

用Python编写的HTML压缩器

我正在寻找一个用Python(模块或命令行)编写的免费(如在自由中)HTML压头(或重新压缩).我不需要使用白名单过滤HTML.我只想缩进(或重新缩进)HTML源代码以使其更具可读性.例如,假设我有以下代码:

<ul><li>Item</li><li>Item
</li></ul>

Run Code Online (Sandbox Code Playgroud)

输出可能是这样的:

<ul>
    <li>Item</li>
    <li>Item</li>
</ul>

Run Code Online (Sandbox Code Playgroud)

注意:我不是在寻找非Python软件的接口(例如Tidy,用C语言编写),而是100%的Python脚本.

非常感谢.

Answer 1

Eli*_*sha 6

您可以使用内置模块xml.dom.minidom的toprettyxml功能：

>>> from xml.dom import minidom
>>> x = minidom.parseString("<ul><li>Item</li><li>Item\n</li></ul>")
>>> print x.toprettyxml()
<?xml version="1.0" ?>
<ul>
    <li>
        Item
    </li>
    <li>
        Item
    </li>
</ul>

Run Code Online (Sandbox Code Playgroud)

Answer 2

小智 5

使用 BeautifulSoup

有十几种方法可以使用 BeautifulSoup 模块及其美化功能。以下是一些帮助您入门的示例。

从命令行

$ python -m BeautifulSoup < somefile.html > prettyfile.html

Run Code Online (Sandbox Code Playgroud)

在 VIM 中（手动）

如果您不想，您不必将文件写回磁盘，但我包含了与命令行示例具有相同效果的步骤。

$ vi somefile.html
:!python -m BeautifulSoup < %
:w prettyfile.html

Run Code Online (Sandbox Code Playgroud)

在 VIM 中（定义键映射）

在 ~/.vimrc 中定义：

nmap =h !python -m BeautifulSoup < %<CR>

Run Code Online (Sandbox Code Playgroud)

那么，当你在vim中打开一个文件需要美化的时候

$vi somefile.html
=h
:w prettyfile.html

Run Code Online (Sandbox Code Playgroud)

再一次，保存美化是可选的。

蟒蛇壳

$ python
>>> from BeautifulSoup import BeautifulSoup as parse_html_string
>>> from os import path
>>> uglyfile = path.abspath('somefile.html')
>>> path.isfile(uglyfile)
True
>>> prettyfile = path.abspath(path.join('.', 'prettyfile.html'))
>>> path.exists(prettyfile)
>>> doc = None
>>> with open(uglyfile, 'r') as infile, open(prettyfile, 'w') as outfile:
...     # Assuming very simple case
...     htmldocstr = infile.read()
...     doc = parse_html_string(htmldocstr)
...     outfile.write(doc.prettify())

# That's it; you can manually manipulate the dom too though
>>> scripts = doc.findAll('script')
>>> meta = doc.findAll('meta')
>>> print doc.prettify()
[imagine beautiful html here]

>>> import jsbeautifier
>>> print jsbeautifier.beautify(script.string)
[imagine beautiful script here]
>>>

Run Code Online (Sandbox Code Playgroud)

Answer 3

Uku*_*kit 3

BeautifulSoup 有一个名为的函数prettify可以执行此操作。看到这个问题

**除非它没有。** 它只为每个缩进级别提供 1 个空格，并且不可参数化 - OP 想要每个级别 4 个空格。它还不允许您指定不希望缩进的标签，例如“<a>”，或内联元素，如“<b>、<i>、<strong>”等。它本质上具有零参数化能力。这就是为什么十多年来你会看到如此多的问题提出这个问题。 (2认同)

归档时间：	14 年，2 月前
查看次数：	4390 次
最近记录：	9 年，8 月前