Mar*_*tin 30 python parsing brackets text-parsing
Python中解析匹配括号中包含的文本块的最佳方法是什么?
"{ { a } { b } { { { c } } } }"
Run Code Online (Sandbox Code Playgroud)
应该最初返回:
[ "{ a } { b } { { { c } } }" ]
Run Code Online (Sandbox Code Playgroud)
把它作为输入应该返回:
[ "a", "b", "{ { c } }" ]
Run Code Online (Sandbox Code Playgroud)
哪个应该返回:
[ "{ c }" ]
[ "c" ]
[]
Run Code Online (Sandbox Code Playgroud)
Pau*_*McG 41
或者这个pyparsing版本:
>>> from pyparsing import nestedExpr
>>> txt = "{ { a } { b } { { { c } } } }"
>>>
>>> nestedExpr('{','}').parseString(txt).asList()
[[['a'], ['b'], [[['c']]]]]
>>>
Run Code Online (Sandbox Code Playgroud)
Cla*_*diu 24
伪代码:
For each string in the array:
Find the first '{'. If there is none, leave that string alone.
Init a counter to 0.
For each character in the string:
If you see a '{', increment the counter.
If you see a '}', decrement the counter.
If the counter reaches 0, break.
Here, if your counter is not 0, you have invalid input (unbalanced brackets)
If it is, then take the string from the first '{' up to the '}' that put the
counter at 0, and that is a new element in your array.
Run Code Online (Sandbox Code Playgroud)
我对Python很陌生,所以对我很轻松,但这是一个有效的实现:
def balanced_braces(args):
parts = []
for arg in args:
if '{' not in arg:
continue
chars = []
n = 0
for c in arg:
if c == '{':
if n > 0:
chars.append(c)
n += 1
elif c == '}':
n -= 1
if n > 0:
chars.append(c)
elif n == 0:
parts.append(''.join(chars).lstrip().rstrip())
chars = []
elif n > 0:
chars.append(c)
return parts
t1 = balanced_braces(["{{ a } { b } { { { c } } } }"]);
print t1
t2 = balanced_braces(t1)
print t2
t3 = balanced_braces(t2)
print t3
t4 = balanced_braces(t3)
print t4
Run Code Online (Sandbox Code Playgroud)
输出:
['{ a } { b } { { { c } } }']
['a', 'b', '{ { c } }']
['{ c }']
['c']
Run Code Online (Sandbox Code Playgroud)
解析使用lepl
(可安装的via $ easy_install lepl
):
from lepl import Any, Delayed, Node, Space
expr = Delayed()
expr += '{' / (Any() | expr[1:,Space()[:]]) / '}' > Node
print expr.parse("{{a}{b}{{{c}}}}")[0]
Run Code Online (Sandbox Code Playgroud)
输出:
Node +- '{' +- Node | +- '{' | +- 'a' | `- '}' +- Node | +- '{' | +- 'b' | `- '}' +- Node | +- '{' | +- Node | | +- '{' | | +- Node | | | +- '{' | | | +- 'c' | | | `- '}' | | `- '}' | `- '}' `- '}'
归档时间: |
|
查看次数: |
28029 次 |
最近记录: |