小编Tib*_*tim的帖子

奇怪的行为正则表达式

我正在编写一个程序来从汇编中的源代码生成令牌,但我有一个奇怪的问题.

有时代码按预期工作,有时不工作!

这是代码(变量是葡萄牙语,但我放了一个翻译):

import re

def tokenize(code):
    tokens = []

    tokens_re = {
    'comentarios'  : '(//.*)',                         # comentary
    'linhas'       : '(\n)',                           # lines
    'instrucoes'   : '(add)',                          # instructions
    'numeros_hex'  : '([-+]?0x[0-9a-fA-F]+)',          # hex numbers
    'numeros_bin'  : '([-+]?0b[0-1]+)',                # binary numbers
    'numeros_dec'  : '([-+]?[0-9]+)'}                  # decimal numbers

    #'reg32'        : 'eax|ebx|ecx|edx|esp|ebp|eip|esi',
    #'reg16'        : 'ax|bx|cx|dx|sp|bp|ip|si',
    #'reg8'         : 'ah|al|bh|bl|ch|cl|dh|dl'}

    pattern = re.compile('|'.join(list(tokens_re.values())))
    scan = pattern.scanner(code)

    while 1:
        m = scan.search()
        if not m:
            break

        tipo = list(tokens_re.keys())[m.lastindex-1]     # type
        valor = repr(m.group(m.lastindex))               # value …

Run Code Online (Sandbox Code Playgroud)

python regex tokenize python-3.x

Tib*_*tim

lucky-day

3
推荐指数

1
解决办法

39
查看次数