Python解析括号内的块

Mar*_*tin 30 python parsing brackets text-parsing

Python中解析匹配括号中包含的文本块的最佳方法是什么?

"{ { a } { b } { { { c } } } }"
Run Code Online (Sandbox Code Playgroud)

应该最初返回:

[ "{ a } { b } { { { c } } }" ]
Run Code Online (Sandbox Code Playgroud)

把它作为输入应该返回:

[ "a", "b", "{ { c } }" ]
Run Code Online (Sandbox Code Playgroud)

哪个应该返回:

[ "{ c }" ]

[ "c" ]

[]
Run Code Online (Sandbox Code Playgroud)

Pau*_*McG 41

或者这个pyparsing版本:

>>> from pyparsing import nestedExpr
>>> txt = "{ { a } { b } { { { c } } } }"
>>>
>>> nestedExpr('{','}').parseString(txt).asList()
[[['a'], ['b'], [[['c']]]]]
>>>
Run Code Online (Sandbox Code Playgroud)

  • http://pythonhosted.org//pyparsing/FTW :-)感谢您和pyparsing. (3认同)
  • 这比接受的答案中的算法更具可读性,而且我还信任一个广泛使用(因此经过测试)的库,而不是我自己开发的解决方案。需要注意的一件非常重要的事情:如果您的整个表达式没有被一对分组符号包围,那么这将只处理 **first** 分组表达式。如果您总是想处理整个表达式,您可以通过在它们尚不存在时添加一对外部分组符号来强制执行此操作。 (2认同)

Cla*_*diu 24

伪代码:

For each string in the array:
    Find the first '{'. If there is none, leave that string alone.
    Init a counter to 0. 
    For each character in the string:  
        If you see a '{', increment the counter.
        If you see a '}', decrement the counter.
        If the counter reaches 0, break.
    Here, if your counter is not 0, you have invalid input (unbalanced brackets)
    If it is, then take the string from the first '{' up to the '}' that put the
     counter at 0, and that is a new element in your array.
Run Code Online (Sandbox Code Playgroud)


Jac*_* M. 6

我对Python很陌生,所以对我很轻松,但这是一个有效的实现:

def balanced_braces(args):
    parts = []
    for arg in args:
        if '{' not in arg:
            continue
        chars = []
        n = 0
        for c in arg:
            if c == '{':
                if n > 0:
                    chars.append(c)
                n += 1
            elif c == '}':
                n -= 1
                if n > 0:
                    chars.append(c)
                elif n == 0:
                    parts.append(''.join(chars).lstrip().rstrip())
                    chars = []
            elif n > 0:
                chars.append(c)
    return parts

t1 = balanced_braces(["{{ a } { b } { { { c } } } }"]);
print t1
t2 = balanced_braces(t1)
print t2
t3 = balanced_braces(t2)
print t3
t4 = balanced_braces(t3)
print t4
Run Code Online (Sandbox Code Playgroud)

输出:

['{ a } { b } { { { c } } }']
['a', 'b', '{ { c } }']
['{ c }']
['c']
Run Code Online (Sandbox Code Playgroud)


jfs*_*jfs 5

解析使用lepl(可安装的via $ easy_install lepl):

from lepl import Any, Delayed, Node, Space

expr = Delayed()
expr += '{' / (Any() | expr[1:,Space()[:]]) / '}' > Node

print expr.parse("{{a}{b}{{{c}}}}")[0]
Run Code Online (Sandbox Code Playgroud)

输出:

Node
 +- '{'
 +- Node
 |   +- '{'
 |   +- 'a'
 |   `- '}'
 +- Node
 |   +- '{'
 |   +- 'b'
 |   `- '}'
 +- Node
 |   +- '{'
 |   +- Node
 |   |   +- '{'
 |   |   +- Node
 |   |   |   +- '{'
 |   |   |   +- 'c'
 |   |   |   `- '}'
 |   |   `- '}'
 |   `- '}'
 `- '}'