spe*_*ler 7 c python regex clang pycparser
我在我的C文件中有这个代码:
printf("Worker name is %s and id is %d", worker.name, worker.id);
我希望,使用Python,能够解析格式字符串并找到"%s"和"%d".
所以我想要一个功能:
>>> my_function("Worker name is %s and id is %d")
[Out1]: ((15, "%s"), (28, "%d))
我试图使用libclang的Python绑定和pycparser来实现这一点,但我没有看到如何使用这些工具完成.
我使用正则表达式来解决这个也试过,但是这不是简单-想使用情况下,当printf有"%%s"和这样的东西.
gcc和clang显然都是编译的一部分 - 没有人将这个逻辑导出到Python?
您当然可以找到带有正则表达式的格式正确的候选项。
看一下C格式规范的定义。(使用Microsoft,但是使用您想要的。)
它是:
%[flags] [width] [.precision] [{h | l | ll | w | I | I32 | I64}] type
您还具有在printf %%中变成的特殊情况%。
您可以将该模式转换为正则表达式:
(                                 # start of capture group 1
%                                 # literal "%"
(?:                               # first option
(?:[-+0 #]{0,5})                  # optional flags
(?:\d+|\*)?                       # width
(?:\.(?:\d+|\*))?                 # precision
(?:h|l|ll|w|I|I32|I64)?           # size
[cCdiouxXeEfgGaAnpsSZ]            # type
) |                               # OR
%%)                               # literal "%%"
然后进入Python正则表达式:
import re
lines='''\
Worker name is %s and id is %d
That is %i%%
%c
Decimal: %d  Justified: %.6d
%10c%5hc%5C%5lc
The temp is %.*f
%ss%lii
%*.*s | %.3d | %lC | %s%%%02d'''
cfmt='''\
(                                  # start of capture group 1
%                                  # literal "%"
(?:                                # first option
(?:[-+0 #]{0,5})                   # optional flags
(?:\d+|\*)?                        # width
(?:\.(?:\d+|\*))?                  # precision
(?:h|l|ll|w|I|I32|I64)?            # size
[cCdiouxXeEfgGaAnpsSZ]             # type
) |                                # OR
%%)                                # literal "%%"
'''
for line in lines.splitlines():
    print '"{}"\n\t{}\n'.format(line, 
           tuple((m.start(1), m.group(1)) for m in re.finditer(cfmt, line, flags=re.X))) 
印刷品:
"Worker name is %s and id is %d"
    ((15, '%s'), (28, '%d'))
"That is %i%%"
    ((8, '%i'), (10, '%%'))
"%c"
    ((0, '%c'),)
"Decimal: %d  Justified: %.6d"
    ((9, '%d'), (24, '%.6d'))
"%10c%5hc%5C%5lc"
    ((0, '%10c'), (4, '%5hc'), (8, '%5C'), (11, '%5lc'))
"The temp is %.*f"
    ((12, '%.*f'),)
"%ss%lii"
    ((0, '%s'), (3, '%li'))
"%*.*s | %.3d | %lC | %s%%%02d"
    ((0, '%*.*s'), (8, '%.3d'), (15, '%lC'), (21, '%s'), (23, '%%'), (25, '%02d'))