Python 中如何将 Markdown 列表解析为字典?

d3p*_*3pd 6 python markdown parsing list

我有如下列表:

- launchers
   - say hello
      - command: echo "hello" | festival --tts
      - icon: sayHello.png
   - say world
      - command: echo "world" | festival --tts
      - icon: sayWorld.png
   - wait
      - command: for ((x = 0; x < 10; ++x)); do :; done
      - icon: wait.png
Run Code Online (Sandbox Code Playgroud)

我想将其解析为如下字典:

{
    "launchers": {
        "say hello": {
            "command": "echo \"hello\" | festival --tts",
            "icon": "sayHello.png"
        }
        "say world": {
            "command": "echo \"world\" | festival --tts",
            "icon": "sayWorld.png"
        }
        "wait": {
            "command": "for ((x = 0; x < 10; ++x)); do :; done",
            "icon": "wait.png"
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

我已经开始编写一些非常手动的代码来计算前导空格(例如len(line.rstrip()) - len(line.rstrip().lstrip())),但我想知道是否有更明智的方法来解决这个问题。我知道 JSON 可以导入到 Python 中,但这不适合我的目的。那么,Python 中如何将文件中的 Markdown 列表解析为字典呢?有没有有效的方法来做到这一点?

这是我现在正在使用的一些基本代码:

for line in open("configuration.md", 'r'):
    indentation = len(line.rstrip()) - len(line.rstrip().lstrip())
    listItem = line.split('-')[1].strip()
    listItemSplit = listItem.split(':')
    key = listItemSplit[0].strip()
    if len(listItemSplit) == 2:
        value = listItemSplit[1].strip()
    else:
        value = ""
    print(indentation, key, value)
Run Code Online (Sandbox Code Playgroud)

Mar*_*ers 6

我假设采用更严格的格式并使用堆栈和正则表达式:

import re    

line = re.compile(r'( *)- ([^:\n]+)(?:: ([^\n]*))?\n?')
depth = 0
stack = [{}]
for indent, name, value in line.findall(inputtext):
    indent = len(indent)
    if indent > depth:
        assert not stack[-1], 'unexpected indent'
    elif indent < depth:
        stack.pop()
    stack[-1][name] = value or {}
    if not value:
        # new branch
        stack.append(stack[-1][name])
    depth = indent

result = stack[0]
Run Code Online (Sandbox Code Playgroud)

这会产生:

>>> import re
>>> inputtext = '''\
... - launchers
...    - say hello
...       - command: echo "hello" | festival --tts
...       - icon: sayHello.png
...    - say world
...       - command: echo "world" | festival --tts
...       - icon: sayWorld.png
...    - wait
...       - command: for ((x = 0; x < 10; ++x)); do :; done
...       - icon: wait.png
... '''
>>> line = re.compile(r'( *)- ([^:\n]+)(?:: ([^\n]*))?\n?')
>>> depth = 0
>>> stack = [{}]
>>> for indent, name, value in line.findall(inputtext):
...     indent = len(indent)
...     if indent > depth:
...         assert not stack[-1], 'unexpected indent'
...     elif indent < depth:
...         stack.pop()
...     stack[-1][name] = value or {}
...     if not value:
...         # new branch
...         stack.append(stack[-1][name])
...     depth = indent
... 
{'command': 'echo "hello" | festival --tts', 'icon': 'sayHello.png'}
{'command': 'echo "world" | festival --tts', 'icon': 'sayWorld.png'}
>>> result = stack[0]
>>> from pprint import pprint
>>> pprint(result)
{'launchers': {'say hello': {'command': 'echo "hello" | festival --tts',
                             'icon': 'sayHello.png'},
               'say world': {'command': 'echo "world" | festival --tts',
                             'icon': 'sayWorld.png'},
               'wait': {'command': 'for ((x = 0; x < 10; ++x)); do :; done',
                        'icon': 'wait.png'}}}
Run Code Online (Sandbox Code Playgroud)

从您的输入文本。