v.o*_*dou 1 c++ python parsing clang
我正在编写文档生成器并且正确地使用包含路径,所以当我解析文件时,我只是完全跳过每个包含.我还手动调整所有有问题的定义或#ifdef块,因为缺少包含(以及不同的命令行与生成版本)而被跳过.
我注意到的问题是:
struct ComplexBuffer : IAnimatable
{
};
Run Code Online (Sandbox Code Playgroud)
随着IAnimatable未声明(或向前声明).
我正在使用clang.cindex的python绑定,所以我使用get_children进行迭代:这个结果出来了:
Found grammar element "IAnimatable" {CursorKind.CLASS_DECL} [line=37, col=8]
Found grammar element "ComplexBuffer" {CursorKind.STRUCT_DECL} [line=39, col=9]
Run Code Online (Sandbox Code Playgroud)
如果我完成基本类型:
class IAnimatable {};
struct ComplexBuffer : IAnimatable
Run Code Online (Sandbox Code Playgroud)
我得到了正确的输出:
Found grammar element "IAnimatable" {CursorKind.CLASS_DECL} [line=37, col=8]
Found grammar element "ComplexBuffer" {CursorKind.STRUCT_DECL} [line=39, col=9]
Found grammar element "class IAnimatable" {CursorKind.CXX_BASE_SPECIFIER} [line=39, col=25]
Found grammar element "class IAnimatable" {CursorKind.TYPE_REF} [line=39, col=25]
Run Code Online (Sandbox Code Playgroud)
正是我想要的,因为我可以检测到继承列表放在文档中.
这个问题只是因为我跳过了所有的包含.
也许我可以通过手动重新解析声明行来解决这个问题?
编辑PS:为了完成我的解析python脚本:
import clang.cindex
index = clang.cindex.Index.create()
tu = index.parse(sys.argv[1], args=["-std=c++98"], options=clang.cindex.TranslationUnit.PARSE_SKIP_FUNCTION_BODIES)
def printall_visitor(node):
print 'Found grammar element "%s" {%s} [line=%s, col=%s]' % (node.displayname, node.kind, node.location.line, node.location.column)
def visit(node, func):
func(node)
for c in node.get_children():
visit(c, func)
visit(tu.cursor, printall_visitor)
Run Code Online (Sandbox Code Playgroud)
我将自己回答这个问题,因为我提出的代码对未来的googlers非常有用.
最后,我编写了两个方法,这些方法应该用于在类声明行的retreive继承列表中查找基类列表.
一个使用AST光标和一个完全手动,尽可能多地处理C++复杂性.
这是整个结果:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
'''
Created on 2013/12/09
@author: voddou
'''
import sys
import re
import clang.cindex
import os
import string
class bcolors:
HEADER = '\033[95m'
OKBLUE = '\033[94m'
CYAN = '\033[96m'
OKGREEN = '\033[92m'
WARNING = '\033[93m'
FAIL = '\033[91m'
ENDC = '\033[0m'
MAGENTA = '\033[95m'
GREY = '\033[90m'
def disable(self):
self.HEADER = ''
self.OKBLUE = ''
self.OKGREEN = ''
self.WARNING = ''
self.FAIL = ''
self.ENDC = ''
self.CYAN = ''
self.MAGENTA = ''
self.GREY = ''
from contextlib import contextmanager
@contextmanager
def scopedColorizer(color):
sys.stdout.write(color)
yield
sys.stdout.write(bcolors.ENDC)
#clang.cindex.Config.set_library_file("C:/python27/DLLs/libclang.dll")
src_filepath = sys.argv[1]
src_basename = os.path.basename(src_filepath)
parseeLines = file(src_filepath).readlines()
def trim_all(astring):
return "".join(astring.split())
def has_token(line, token):
trimed = trim_all(line)
pos = string.find(trimed, token)
return pos != -1
def has_any_token(line, token_list):
results = [has_token(line, t) for t in token_list]
return any(results)
def is_any(astring, some_strings):
return any([x == astring for x in some_strings])
def comment_out(line):
return "//" + line
# alter the original file to remove #inlude directives and protective ifdef blocks
for i, l in enumerate(parseeLines):
if has_token(l, "#include"):
parseeLines[i] = comment_out(l)
elif has_any_token(l, ["#ifdef", "#ifdefined", "#ifndef", "#if!defined", "#endif", "#elif", "#else"]):
parseeLines[i] = comment_out(l)
index = clang.cindex.Index.create()
tu = index.parse(src_basename,
args=["-std=c++98"],
unsaved_files=[(src_basename, "".join(parseeLines))],
options=clang.cindex.TranslationUnit.PARSE_SKIP_FUNCTION_BODIES)
print 'Translation unit:', tu.spelling, "\n"
def gather_until(strlist, ifrom, endtokens):
"""make one string out of a list of strings, starting from a given index, until one token in endtokens is found.
ex: gather_until(["foo", "toto", "bar", "kaz"], 1, ["r", "z"])
will yield "totoba"
"""
result = strlist[ifrom]
nextline = ifrom + 1
while not any([string.find(result, token) != -1 for token in endtokens]):
result = result + strlist[nextline]
nextline = nextline + 1
nearest = result
for t in endtokens:
nearest = nearest.partition(t)[0]
return nearest
def strip_templates_parameters(declline):
"""remove any content between < >
"""
res = ""
nested = 0
for c in declline:
if c == '>':
nested = nested - 1
if nested == 0:
res = res + c
if c == '<':
nested = nested + 1
return res
# thanks Markus Jarderot from Stackoverflow.com
def comment_remover(text):
def replacer(match):
s = match.group(0)
if s.startswith('/'):
return ""
else:
return s
pattern = re.compile(
r'//.*?$|/\*.*?\*/|\'(?:\\.|[^\\\'])*\'|"(?:\\.|[^\\"])*"',
re.DOTALL | re.MULTILINE
)
return re.sub(pattern, replacer, text)
def replace_any_of(haystack, list_of_candidates, by_what):
for cand in list_of_candidates:
haystack = string.replace(haystack, cand, by_what)
return haystack
cxx_keywords = ["class", "struct", "public", "private", "protected"]
def clean_name(displayname):
"""remove namespace and type tags
"""
r = displayname.rpartition("::")[2]
r = replace_any_of(r, cxx_keywords, "")
return r
def find_parents_using_clang(node):
l = []
for c in node.get_children():
if c.kind == clang.cindex.CursorKind.CXX_BASE_SPECIFIER:
l.append(clean_name(c.displayname))
return None if len(l) == 0 else l
# syntax based custom parsing
def find_parents_list(node):
ideclline = node.location.line - 1
declline = parseeLines[ideclline]
with scopedColorizer(bcolors.WARNING):
print "class decl line:", declline.strip()
fulldecl = gather_until(parseeLines, ideclline, ["{", ";"])
fulldecl = clean_name(fulldecl)
fulldecl = trim_all(fulldecl)
if string.find(fulldecl, ":") != -1: # if inheritance exists on the declaration line
baselist = fulldecl.partition(":")[2]
res = strip_templates_parameters(baselist) # because they are separated by commas, they would break the split(",")
res = comment_remover(res)
res = res.split(",")
return res
return None
# documentation generator
def make_htll_visitor(node):
if (node.kind == clang.cindex.CursorKind.CLASS_DECL
or node.kind == clang.cindex.CursorKind.STRUCT_DECL
or node.kind == clang.cindex.CursorKind.CLASS_TEMPLATE):
bases2 = find_parents_list(node)
bases = find_parents_using_clang(node)
if bases is not None:
with scopedColorizer(bcolors.CYAN):
print "class clang list of bases:", str(bases)
if bases2 is not None:
with scopedColorizer(bcolors.MAGENTA):
print "class manual list of bases:", str(bases2)
def visit(node, func):
func(node)
for c in node.get_children():
visit(c, func)
visit(tu.cursor, make_htll_visitor)
with scopedColorizer(bcolors.OKGREEN):
print "all over"
Run Code Online (Sandbox Code Playgroud)
这段代码允许我接受不完整的C++翻译单元,正确解析如下的声明:
struct ComplexBuffer
: IAnimatable
, Bugger,
Mozafoka
{
};
Run Code Online (Sandbox Code Playgroud)
应对这些:
struct AnimHandler : NonCopyable, IHandlerPrivateGetter< AnimHandler, AafHandler > // CRTP
{
...
};
Run Code Online (Sandbox Code Playgroud)
给我这个输出:
class manual list of bases: ['NonCopyable', 'IHandlerPrivateGetter<>']
Run Code Online (Sandbox Code Playgroud)
这很好,clang函数版本没有返回基类列表中的单个类.现在可以预见set,如果手动解析器会遗漏某些内容,则使用一个安全端来合并这两个函数的结果.但是我认为这可能会导致细微的重复,因为displayname我和自己的解析器之间存在差异.
但是你去googlers,一个不错的clang python文档生成器模板,它不需要构建选项的完全正确性,并且非常快,因为它完全忽略了includes语句.
所有人都过得愉快.
| 归档时间: |
|
| 查看次数: |
1796 次 |
| 最近记录: |