lee*_*777 29 c++ gcc virtual-inheritance vtable vtt
最近遇到了一个对我来说很新的C++链接器错误.
libfoo.so: undefined reference to `VTT for Foo'
libfoo.so: undefined reference to `vtable for Foo'
Run Code Online (Sandbox Code Playgroud)
我认识到这个错误并解决了我的问题,但我还有一个唠叨的问题:什么是VTT?
旁白:对于那些感兴趣的人,当您忘记定义类中声明的第一个虚函数时,会出现问题.vtable进入类的第一个虚函数的编译单元.如果你忘了定义那个函数,你会得到一个链接器错误,它无法找到vtable而不是更加开发人员友好的找不到该函数.
osg*_*sgx 42
"GCC C++编译器v4.0.1中的多重继承说明 "页面现已脱机,http://web.archive.org未将其归档.所以,我在tinydrblog上找到了一份文本的副本,该文本存档在网络存档中.
在圣路易斯华盛顿大学计算机科学系的分布式对象计算实验室,由Morgan Deters毕业生作为" 博士程序设计语言研讨会:GCC内部 "(2005年秋季)的一部分在线发布了原始笔记的全文. "
他的(存档的)主页:
THIS IS THE TEXT by Morgan Deters and NOT CC-licensed.
Run Code Online (Sandbox Code Playgroud)
Morgan Deters网页:
第1部分:
基础知识:单一继承
正如我们在课堂上讨论的那样,单继承导致一个对象布局,其基类数据在派生类数据之前布局.因此,如果类
A和B定义如此:Run Code Online (Sandbox Code Playgroud)class A { public: int a;};
Run Code Online (Sandbox Code Playgroud)class B : public A { public: int b; };那么类型的对象
B就像这样布局(其中"b"是指向这样一个对象的指针):Run Code Online (Sandbox Code Playgroud)b --> +-----------+ | a | +-----------+ | b | +-----------+如果您有虚拟方法:
Run Code Online (Sandbox Code Playgroud)class A { public: int a; virtual void v(); }; class B : public A { public: int b; };那么你也会有一个vtable指针:
Run Code Online (Sandbox Code Playgroud)+-----------------------+ | 0 (top_offset) | +-----------------------+ b --> +----------+ | ptr to typeinfo for B | | vtable |-------> +-----------------------+ +----------+ | A::v() | | a | +-----------------------+ +----------+ | b | +----------+也就是说,
top_offsettypeinfo指针位于vtable指针所指向的位置之上.简单的多重继承
现在考虑多重继承:
Run Code Online (Sandbox Code Playgroud)class A { public: int a; virtual void v(); }; class B { public: int b; virtual void w(); }; class C : public A, public B { public: int c; };在这种情况下,类型C的对象布局如下:
Run Code Online (Sandbox Code Playgroud)+-----------------------+ | 0 (top_offset) | +-----------------------+ c --> +----------+ | ptr to typeinfo for C | | vtable |-------> +-----------------------+ +----------+ | A::v() | | a | +-----------------------+ +----------+ | -8 (top_offset) | | vtable |---+ +-----------------------+ +----------+ | | ptr to typeinfo for C | | b | +---> +-----------------------+ +----------+ | B::w() | | c | +-----------------------+ +----------+...但为什么?为什么两个vtable合二为一?好吧,考虑类型替换.如果我有一个指向C的指针,我可以将它传递给一个函数,该函数需要指向A的指针或一个期望指向B的函数.如果函数需要指向A的指针,并且我想将它传递给我的变量c(类型指针指向C)的值,我已经设置好了.调用
A::v()可通过(第一)虚表制成,并且被调用的函数可以通过指针我通过以相同的方式访问一个部件,因为它可以通过任何指针到-A.但是,如果我将指针变量的值传递
c给期望指向B的函数,我们还需要在C中使用类型B的子对象来引用它.这就是为什么我们有第二个vtable指针.我们可以将指针值(c + 8个字节)传递给期望指向B的函数,并且它全部设置:它可以B::w()通过(第二个)vtable指针进行调用,并通过指针访问成员b我们通过任何指向B的方式传递相同的方式.注意,对于被调用的方法也需要进行这种"指针校正".在这种情况下,类
C继承B::w().当w()通过指向C的指针调用时,指针(成为内部的指针)w()需要进行调整.这通常称为指针调整.在某些情况下,编译器将生成一个thunk来修复地址.考虑与上面相同的代码,但这次
C覆盖了B成员函数w():Run Code Online (Sandbox Code Playgroud)class A { public: int a; virtual void v(); }; class B { public: int b; virtual void w(); }; class C : public A, public B { public: int c; void w(); };
C对象布局和vtable现在看起来像这样:Run Code Online (Sandbox Code Playgroud)+-----------------------+ | 0 (top_offset) | +-----------------------+ c --> +----------+ | ptr to typeinfo for C | | vtable |-------> +-----------------------+ +----------+ | A::v() | | a | +-----------------------+ +----------+ | C::w() | | vtable |---+ +-----------------------+ +----------+ | | -8 (top_offset) | | b | | +-----------------------+ +----------+ | | ptr to typeinfo for C | | c | +---> +-----------------------+ +----------+ | thunk to C::w() | +-----------------------+现在,当通过指向B
w()的实例调用时,将调用Cthunk.thunk做什么?让我们拆解它(在这里,用gdb):Run Code Online (Sandbox Code Playgroud)0x0804860c <_ZThn8_N1C1wEv+0>: addl $0xfffffff8,0x4(%esp) 0x08048611 <_ZThn8_N1C1wEv+5>: jmp 0x804853c <_ZN1C1wEv>所以它只是调整
this指针并跳转到C::w().一切都很好.但是上面的意思不是说
Bvtable总是指向这个C::w()thunk?我的意思是,如果我们有一个合法的指针BB(不是aC),我们不想调用thunk,对吧?对.用于上述嵌入虚函数表
B中C是特殊的B-在-C的情况下.B的常规vtable是正常的,B::w()直接指向.Diamond:基类的多个副本(非虚拟继承)
好的.现在要解决真正困难的问题.回想一下在形成继承菱形时基类的多个副本的常见问题:
Run Code Online (Sandbox Code Playgroud)class A { public: int a; virtual void v(); }; class B : public A { public: int b; virtual void w(); }; class C : public A { public: int c; virtual void x(); }; class D : public B, public C { public: int d; virtual void y(); };请注意,
D继承自B和C,B并且C都继承自A.这意味着它D有两个副本A.对象布局和vtable嵌入是我们对前面部分的期望:Run Code Online (Sandbox Code Playgroud)+-----------------------+ | 0 (top_offset) | +-----------------------+ d --> +----------+ | ptr to typeinfo for D | | vtable |-------> +-----------------------+ +----------+ | A::v() | | a | +-----------------------+ +----------+ | B::w() | | b | +-----------------------+ +----------+ | D::y() | | vtable |---+ +-----------------------+ +----------+ | | -12 (top_offset) | | a | | +-----------------------+ +----------+ | | ptr to typeinfo for D | | c | +---> +-----------------------+ +----------+ | A::v() | | d | +-----------------------+ +----------+ | C::x() | +-----------------------+当然,我们期望
A数据(成员a)在D对象布局中存在两次(并且它是),并且我们期望A虚拟成员函数在vtable中表示两次(并且A::v()确实存在).好的,这里没什么新鲜的.钻石:虚拟基地的单一副本
但是,如果我们应用虚拟继承呢?C++虚拟继承允许我们指定一个钻石层次结构,但只能保证一个虚拟继承基础的副本.所以让我们用这样的方式编写代码:
Run Code Online (Sandbox Code Playgroud)class A { public: int a; virtual void v(); }; class B : public virtual A { public: int b; virtual void w(); }; class C : public virtual A { public: int c; virtual void x(); }; class D : public B, public C { public: int d; virtual void y(); };事情突然变得复杂得多.如果我们只能有一个副本
A在我们的代表D,那么我们就可以不再逃脱我们的嵌入的"绝招"C在D(与嵌入一个虚函数表的C部分D中D的虚函数表).但是,如果我们不能这样做,我们如何处理通常的类型替换呢?让我们尝试绘制布局图:
Run Code Online (Sandbox Code Playgroud)+-----------------------+ | 20 (vbase_offset) | +-----------------------+ | 0 (top_offset) | +-----------------------+ | ptr to typeinfo for D | +----------> +-----------------------+ d --> +----------+ | | B::w() | | vtable |----+ +-----------------------+ +----------+ | D::y() | | b | +-----------------------+ +----------+ | 12 (vbase_offset) | | vtable |---------+ +-----------------------+ +----------+ | | -8 (top_offset) | | c | | +-----------------------+ +----------+ | | ptr to typeinfo for D | | d | +-----> +-----------------------+ +----------+ | C::x() | | vtable |----+ +-----------------------+ +----------+ | | 0 (vbase_offset) | | a | | +-----------------------+ +----------+ | | -20 (top_offset) | | +-----------------------+ | | ptr to typeinfo for D | +----------> +-----------------------+ | A::v() | +-----------------------+好的.所以你看到它
A现在嵌入D的方式基本上与其他基础相同.但它嵌入在D而不是直接派生类中.
osg*_*sgx 14
THIS IS THE TEXT by Morgan Deters and NOT CC-licensed.
Run Code Online (Sandbox Code Playgroud)
Morgan Deters webpages:
PART2:
Construction/Destruction in the Presence of Multiple Inheritance
How is the above object constructed in memory when the object itself is constructed? And how do we ensure that a partially-constructed object (and its vtable) are safe for constructors to operate on?
Fortunately, it's all handled very carefully for us. Say we're constructing a new object of type
D(through, for example,new D). First, the memory for the object is allocated in the heap and a pointer returned.D's constructor is invoked, but before doing anyD-specific construction it call'sA's constructor on the object (after adjusting thethispointer, of course!).A's constructor fills in theApart of theDobject as if it were an instance ofA.Run Code Online (Sandbox Code Playgroud)d --> +----------+ | | +----------+ | | +----------+ | | +----------+ | | +-----------------------+ +----------+ | 0 (top_offset) | | | +-----------------------+ +----------+ | ptr to typeinfo for A | | vtable |-----> +-----------------------+ +----------+ | A::v() | | a | +-----------------------+ +----------+Control返回到
D构造函数,该构造函数调用B的构造函数.(此处不需要指针调整.)当B构造函数完成时,对象如下所示:Run Code Online (Sandbox Code Playgroud)B-in-D +-----------------------+ | 20 (vbase_offset) | +-----------------------+ | 0 (top_offset) | +-----------------------+ d --> +----------+ | ptr to typeinfo for B | | vtable |------> +-----------------------+ +----------+ | B::w() | | b | +-----------------------+ +----------+ | 0 (vbase_offset) | | | +-----------------------+ +----------+ | -20 (top_offset) | | | +-----------------------+ +----------+ | ptr to typeinfo for B | | | +--> +-----------------------+ +----------+ | | A::v() | | vtable |---+ +-----------------------+ +----------+ | a | +----------+但等待...
B的构造A函数通过更改它的vtable指针来修改对象的一部分!如何区分这种B-in-D与B-in-something-else(或者B就此而言是独立的)?简单.该虚拟表的表告诉它这样做.这种结构缩写为VTT,是一种用于建筑的vtable表.在我们的例子中,VTTD看起来像这样:Run Code Online (Sandbox Code Playgroud)B-in-D +-----------------------+ | 20 (vbase_offset) | VTT for D +-----------------------+ +-------------------+ | 0 (top_offset) | | vtable for D |-------------+ +-----------------------+ +-------------------+ | | ptr to typeinfo for B | | vtable for B-in-D |-------------|----------> +-----------------------+ +-------------------+ | | B::w() | | vtable for B-in-D |-------------|--------+ +-----------------------+ +-------------------+ | | | 0 (vbase_offset) | | vtable for C-in-D |-------------|-----+ | +-----------------------+ +-------------------+ | | | | -20 (top_offset) | | vtable for C-in-D |-------------|--+ | | +-----------------------+ +-------------------+ | | | | | ptr to typeinfo for B | | vtable for D |----------+ | | | +-> +-----------------------+ +-------------------+ | | | | | A::v() | | vtable for D |-------+ | | | | +-----------------------+ +-------------------+ | | | | | | | | | | C-in-D | | | | | +-----------------------+ | | | | | | 12 (vbase_offset) | | | | | | +-----------------------+ | | | | | | 0 (top_offset) | | | | | | +-----------------------+ | | | | | | ptr to typeinfo for C | | | | | +----> +-----------------------+ | | | | | C::x() | | | | | +-----------------------+ | | | | | 0 (vbase_offset) | | | | | +-----------------------+ | | | | | -12 (top_offset) | | | | | +-----------------------+ | | | | | ptr to typeinfo for C | | | | +-------> +-----------------------+ | | | | A::v() | | | | +-----------------------+ | | | | | | D | | | +-----------------------+ | | | | 20 (vbase_offset) | | | | +-----------------------+ | | | | 0 (top_offset) | | | | +-----------------------+ | | | | ptr to typeinfo for D | | | +----------> +-----------------------+ | | | B::w() | | | +-----------------------+ | | | D::y() | | | +-----------------------+ | | | 12 (vbase_offset) | | | +-----------------------+ | | | -8 (top_offset) | | | +-----------------------+ | | | ptr to typeinfo for D | +----------------> +-----------------------+ | | C::x() | | +-----------------------+ | | 0 (vbase_offset) | | +-----------------------+ | | -20 (top_offset) | | +-----------------------+ | | ptr to typeinfo for D | +-------------> +-----------------------+ | A::v() | +-----------------------+D's constructor passes a pointer into D's VTT to B's constructor (in this case, it passes in the address of the first B-in-D entry). And, indeed,the vtable that was used for the object layout above is a special vtable used just for the construction of B-in-D.
Control is returned to the D constructor, and it calls the C constructor(with a VTT address parameter pointing to the "C-in-D+12" entry). When C's constructor is done with the object it looks like this:
Run Code Online (Sandbox Code Playgroud)B-in-D +-----------------------+ | 20 (vbase_offset) | +-----------------------+ | 0 (top_offset) | +-----------------------+ | ptr to typeinfo for B | +---------------------------------> +-----------------------+ | | B::w() | | +-----------------------+ | C-in-D | 0 (vbase_offset) | | +-----------------------+ +-----------------------+ d --> +----------+ | | 12 (vbase_offset) | | -20 (top_offset) | | vtable |--+ +-----------------------+ +-----------------------+ +----------+ | 0 (top_offset) | | ptr to typeinfo for B | | b | +-----------------------+ +-----------------------+ +----------+ | ptr to typeinfo for C | | A::v() | | vtable |--------> +-----------------------+ +-----------------------+ +----------+ | C::x() | | c | +-----------------------+ +----------+ | 0 (vbase_offset) | | | +-----------------------+ +----------+ | -12 (top_offset) | | vtable |--+ +-----------------------+ +----------+ | | ptr to typeinfo for C | | a | +-----> +-----------------------+ +----------+ | A::v() | +-----------------------+As you see, C's constructor again modified the embedded A's vtable pointer.The embedded C and A objects are now using the special construction C-in-D vtable, and the embedded B object is using the special construction B-in-D vtable. Finally, D's constructor finishes the job and we end up with the same diagram as before:
Run Code Online (Sandbox Code Playgroud)+-----------------------+ | 20 (vbase_offset) | +-----------------------+ | 0 (top_offset) | +-----------------------+ | ptr to typeinfo for D | +----------> +-----------------------+ d --> +----------+ | | B::w() | | vtable |----+ +-----------------------+ +----------+ | D::y() | | b | +-----------------------+ +----------+ | 12 (vbase_offset) | | vtable |---------+ +-----------------------+ +----------+ | | -8 (top_offset) | | c | | +-----------------------+ +----------+ | | ptr to typeinfo for D | | d | +-----> +-----------------------+ +----------+ | C::x() | | vtable |----+ +-----------------------+ +----------+ | | 0 (vbase_offset) | | a | | +-----------------------+ +----------+ | | -20 (top_offset) | | +-----------------------+ | | ptr to typeinfo for D | +----------> +-----------------------+ | A::v() | +-----------------------+Destruction occurs in the same fashion but in reverse. D's destructor is invoked. After the user's destruction code runs, the destructor calls C's destructor and directs it to use the relevant portion of D's VTT. C's destructor manipulates the vtable pointers in the same way it did during construction; that is, the relevant vtable pointers now point into the C-in-D construction vtable. Then it runs the user's destruction code for C and returns control to D's destructor, which next invokes B's destructor with a reference into D's VTT. B's destructor sets up the relevant portions of the object to refer into the B-in-D construction vtable. It runs the user's destruction code for B and returns control to D's destructor, which finally invokes A's destructor. A's destructor changes the vtable for the A portion of the object to refer into the vtable for A. Finally, control returns to D's destructor and destruction of the object is complete. The memory once used by the object is returned to the system.
Now, in fact, the story is somewhat more complicated. Have you ever seen those "in-charge" and "not-in-charge" constructor and destructor specifications in GCC-produced warning and error messages or in GCC-produced binaries? Well, the fact is that there can be two constructor implementations and up to three destructor implementations.
An "in-charge" (or complete object) constructor is one that constructs virtual bases, and a "not-in-charge" (or base object) constructor is one that does not. Consider our above example. If a B is constructed, its constructor needs to call A's constructor to construct it. Similarly, C's constructor needs to construct A. However, if B and C are constructed as part of a construction of a D, their constructors should not construct A, because A is a virtual base and D's constructor will take care of constructing it exactly once for the instance of D. Consider the cases:
If you do a new A, A's "in-charge" constructor is invoked to construct A. When you do a new B, B's "in-charge" constructor is invoked. It will call the "not-in-charge" constructor for A.
new C is similar to new B.
A new D invokes D's "in-charge" constructor. Wewalked through this example. D's "in-charge" constructor calls the"not-in-charge" versions of A's, B's, and C's constructors (in thatorder).
An "in-charge" destructor is the analogue of an "in-charge"constructor---it takes charge of destructing virtual bases. Similarly,a "not-in-charge" destructor is generated. But there's a third one as well. An "in-charge deleting" destructor is one that deallocates the storage as well as destructing the object. So when is one called in preference to the other?
Well, there are two kinds of objects that can be destructed---those allocated on the stack, and those allocated in the heap. Consider this code (given our diamond hierarchy with virtual-inheritance from before):
Run Code Online (Sandbox Code Playgroud)D d; // allocates a D on the stack and constructs it D *pd = new D; // allocates a D in the heap and constructs it /* ... */ delete pd; // calls "in-charge deleting" destructor for D return; // calls "in-charge" destructor for stack-allocated DWe see that the actual delete operator isn't invoked by the code doing the delete, but rather by the in-charge deleting destructor for the object being deleted. Why do it this way? Why not have the caller call the in-charge destructor, then delete the object? Then you'd have only two copies of destructor implementations instead of three...
Well, the compiler could do such a thing, but it would be morecomplicated for other reasons. Consider this code (assuming a virtual destructor,which you always use, right?...right?!?):
Run Code Online (Sandbox Code Playgroud)D *pd = new D; // allocates a D in the heap and constructs it C *pc = d; // we have a pointer-to-C that points to our heap-allocated D /* ... */ delete pc; // call destructor thunk through vtable, but what about delete?If you didn't have an "in-charge deleting" variety of D's destructor, then the delete operation would need to adjust the pointer just like the destructor thunk does. Remember, the C object is embedded in a D, and so our pointer-to-C above is adjusted to point into the middle of our D object.We can't just delete this pointer, since it isn't the pointer that was returned by
malloc()when we constructed it.So, if we didn't have an in-charge deleting destructor, we'd have to have thunks to the delete operator (and represent them in our vtables), or something else similar.
Thunks, Virtual and Non-Virtual
This section not written yet.
Multiple Inheritance with Virtual Methods on One Side
Okay. One last exercise. What if we have a diamond inheritance hierarchy with virtual inheritance, as before, but only have virtual methods along one side of it? So:
Run Code Online (Sandbox Code Playgroud)class A { public: int a; }; class B : public virtual A { public: int b; virtual void w(); }; class C : public virtual A { public: int c; }; class D : public B, public C { public: int d; virtual void y(); };In this case the object layout is the following:
Run Code Online (Sandbox Code Playgroud)+-----------------------+ | 20 (vbase_offset) | +-----------------------+ | 0 (top_offset) | +-----------------------+ | ptr to typeinfo for D | +----------> +-----------------------+ d --> +----------+ | | B::w() | | vtable |----+ +-----------------------+ +----------+ | D::y() | | b | +-----------------------+ +----------+ | 12 (vbase_offset) | | vtable |---------+ +-----------------------+ +----------+ | | -8 (top_offset) | | c | | +-----------------------+ +----------+ | | ptr to typeinfo for D | | d | +-----> +-----------------------+ +----------+ | a | +----------+So you can see the C subobject, which has no virtual methods, still has a vtable (albeit empty). Indeed, all instances of C have an empty vtable.
Thanks, Morgan Deters!!