通过pyparsing解析逻辑句子非常慢

Question

通过pyparsing解析逻辑句子非常慢

我尝试使用pyparsing来解析这些逻辑表达式

x
FALSE
NOT x
(x + y <= 5) AND (y >= 10) OR NOT (z < 100 OR w)

(A=True OR NOT (G < 8) => S = J) => ((P = A) AND not (P = 1) AND (B = O)) => (S = T)

((P = T) AND NOT (K =J) AND (B = F)) => (S = O) AND
 ((P = T) OR (k and b => (8 + z <= 10)) AND NOT (a + 9 <= F)) => (7 = a + z)

Run Code Online (Sandbox Code Playgroud)

我在下面写的代码似乎工作正常 - 但它很慢(例如上面的最后一个例子需要几秒钟).我是否以某种低效的方式构造语法？应该使用递归而不是operatorPrecedence？有没有办法加快速度？

identifier = Group(Word(alphas, alphanums + "_")  +  Optional("'"))
num = Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")
operator = Regex(">=|<=|!=|>|<|=")
operand = identifier |  num  
aexpr = operatorPrecedence(operand,
                           [('*',2,opAssoc.LEFT,),
                            ('+',2,opAssoc.LEFT,),
                            (operator,2,opAssoc.LEFT,)
                            ])

op_prec = [(CaselessLiteral('not'),1,opAssoc.RIGHT,),
           (CaselessLiteral('and'),2,opAssoc.LEFT ,),
           (CaselessLiteral('or'), 2,opAssoc.LEFT ,),
           ('=>', 2,opAssoc.LEFT ,),
           ]
sentence = operatorPrecedence(aexpr,op_prec)
return sentence

Run Code Online (Sandbox Code Playgroud)

Answer 1

Tes*_*ore 6

我有同样的问题.在这里找到了解决方案(parserElement.enablePackrat()):http://pyparsing.wikispaces.com/share/view/26068641？replyId = 26084853

现在可以立即解析以下代码(之前60秒)

ParserElement.enablePackrat()

integer  = Word(nums).setParseAction(lambda t:int(t[0]))('int')
operand  = integer | variable('var')

# Left precedence
eq    = Literal("==")('eq')
gt    = Literal(">")('gt')
gtEq  = Literal(">=")('gtEq')
lt    = Literal("<")('lt')
ltEq  = Literal("<=")('ltEq')
notEq = Literal("!=")('notEq')
mult  = oneOf('* /')('mult')
plus  = oneOf('+ -')('plus')

_and  = oneOf('&& and')('and')
_or   = oneOf('|| or')('or')

# Right precedence
sign     = oneOf('+ -')('sign')
negation = Literal('!')('negation')

# Operator groups per presedence
right_op = negation | sign 

# Highest precedence
left_op_1 = mult 
left_op_2 = plus 
left_op_3 = gtEq | ltEq | lt | gt
left_op_4 = eq   | notEq
left_op_5 = _and
left_op_6 = _or
# Lowest precedence

condition = operatorPrecedence( operand, [
     (right_op,   1, opAssoc.RIGHT),
     (left_op_1,  2, opAssoc.LEFT),
     (left_op_2,  2, opAssoc.LEFT),
     (left_op_3,  2, opAssoc.LEFT),
     (left_op_4,  2, opAssoc.LEFT),
     (left_op_5,  2, opAssoc.LEFT),
     (left_op_6,  2, opAssoc.LEFT)
    ]
)('computation')

Run Code Online (Sandbox Code Playgroud)

Answer 2

tow*_*owi 4

我把你的代码放到一个小程序中

from sys import argv
from pyparsing import *

def parsit(aexpr):
    identifier = Group(Word(alphas, alphanums + "_")  +  Optional("'"))
    num = Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")
    operator = Regex(">=|<=|!=|>|<|=")
    operand = identifier |  num
    aexpr = operatorPrecedence(operand,
                               [('*',2,opAssoc.LEFT,),
                                ('+',2,opAssoc.LEFT,),
                                (operator,2,opAssoc.LEFT,)
                                ])

    op_prec = [(CaselessLiteral('not'),1,opAssoc.RIGHT,),
               (CaselessLiteral('and'),2,opAssoc.LEFT ,),
               (CaselessLiteral('or'), 2,opAssoc.LEFT ,),
               ('=>', 2,opAssoc.LEFT ,),
               ]
    sentence = operatorPrecedence(aexpr,op_prec)
    return sentence

def demo02(arg):
    sent = parsit(arg)
    print arg, ":", sent.parseString(arg)

def demo01():
    for arg in ["x", "FALSE", "NOT x",
                  "(x + y <= 5) AND (y >= 10) OR NOT (z < 100 OR w)",
                  "(A=True OR NOT (G < 8) => S = J) => ((P = A) AND not (P = 1) AND (B = O)) => (S = T)",
                  "((P = T) AND NOT (K =J) AND (B = F)) => (S = O) AND ((P = T) OR (k and b => (8 + z <= 10)) AND NOT (a + 9 <= F)) => (7 = a + z)"
                  ]:
        demo02(arg)


if len(argv) <= 1:
    demo01()
else:
    for arg in argv[1:]:
        demo02(arg)

Run Code Online (Sandbox Code Playgroud)

并跑过cProfile

$ python -m cProfile pyparsetest.py

Run Code Online (Sandbox Code Playgroud)

你会发现很多 parseImpl调用，但是在输出的中间有

2906500/8   26.374    0.000   72.667    9.083 pyparsing.py:913(_parseNoCache)
212752/300    1.045    0.000   72.608    0.242 pyparsing.py:985(tryParse)

Run Code Online (Sandbox Code Playgroud)

蜜蜂72.667从总时间中计算出时间72。

因此，我大胆猜测“缓存”将提供一个很好的杠杆。

不过，仅启用 http://pyparsing-public.wikispaces.com/FAQs 并没有帮助。我添加了这些行

import pyparsing
pyparsing.usePackrat = True

Run Code Online (Sandbox Code Playgroud)

并且运行时间是相同的。

Number-Regex 对我来说看起来也不错——我猜是相当标准的。例如将其替换为

#num = Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")
num = Regex(r"8|1|10|100|5")

Run Code Online (Sandbox Code Playgroud)

也没有帮助。在我的简单变体中没有“空匹配”，我猜这可能是一个问题 - 但似乎不是。

最后一次尝试是使用以下命令查看结果解析器：

....
sentence = operatorPrecedence(aexpr,op_prec)
print sentence 
return sentence
....

Run Code Online (Sandbox Code Playgroud)

而且...哇...长！

好吧，不使用你的第一个operatorPrecedence会快得多，但对于算术来说不再适用。

因此，我大胆猜测，是的，尝试更多地分离两种表达式（布尔和算术）。也许这会改善它。我也会研究一下，我也很感兴趣。

归档时间：	13 年前
查看次数：	1717 次
最近记录：	11 年，8 月前