使用docutils从restructedtext中的代码指令中提取代码

Question

使用docutils从restructedtext中的代码指令中提取代码

mja*_*ews 5 python restructuredtext docutils

我想从重组结构化文本字符串中的代码指令中逐字提取源代码。

接下来是我第一次尝试这样做，但是我想知道是否有更好的方法（即更健壮或更通用或更直接）来做到这一点。

假设我在Python中将以下第一个文本作为字符串：

s = '''

My title
========

Use this to square a number.

.. code:: python

   def square(x):
       return x**2

and here is some javascript too.

.. code:: javascript

    foo = function() {
        console.log('foo');
    }

'''

Run Code Online (Sandbox Code Playgroud)

要获得两个代码块，我可以做

from docutils.core import publish_doctree

doctree = publish_doctree(s)
source_code = [child.astext() for child in doctree.children 
if 'code' in child.attributes['classes']]

Run Code Online (Sandbox Code Playgroud)

现在，source_code是仅包含两个代码块中逐字记录源代码的列表。如果需要，我也可以使用child的attribute属性来找出代码类型。

它可以完成工作，但是有更好的方法吗？

Answer 1

小智 5

您的解决方案只会在文档的顶层找到代码块，如果类“代码”用于其他元素（不太可能，但可能），它可能会返回误报。我还会检查元素/节点的类型，在其 .tagname 属性中指定。

节点上有一个“遍历”方法（文档/文档树只是一个特殊的节点），它可以完全遍历文档树。它将查看文档中的所有元素，并仅返回那些匹配用户指定条件（返回布尔值的函数）的元素。就是这样：

def is_code_block(node):
    return (node.tagname == 'literal_block'
            and 'code' in node.attributes['classes'])

code_blocks = doctree.traverse(condition=is_code_block)
source_code = [block.astext() for block in code_blocks]

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年前
查看次数：	482 次
最近记录：	6 年，5 月前