为什么gcc没有g ++尾调用优化?

Dan*_*ani 10 optimization

我想检查g ++是否支持尾调用,所以我编写了这个简单的程序来检查它:http://ideone.com/hnXHv

using namespace std;

size_t st;

void PrintStackTop(const std::string &type)
{
    int stack_top;
    if(st == 0) st = (size_t) &stack_top;
    cout << "In " << type << " call version, the stack top is: " << (st - (size_t) &stack_top) << endl;
}

int TailCallFactorial(int n, int a = 1)
{
    PrintStackTop("tail");
    if(n < 2)
        return a;
    return TailCallFactorial(n - 1, n * a);
}

int NormalCallFactorial(int n)
{
    PrintStackTop("normal");
    if(n < 2)
        return 1;
    return NormalCallFactorial(n - 1) * n;
}


int main(int argc, char *argv[])
{
    st = 0;
    cout << TailCallFactorial(5) << endl;
    st = 0;
    cout << NormalCallFactorial(5) << endl;
    return 0;
}
Run Code Online (Sandbox Code Playgroud)

当我正常编译时,似乎g ++并没有真正注意到两个版本之间的任何差异:

> g++ main.cpp -o TailCall
> ./TailCall
In tail call version, the stack top is: 0
In tail call version, the stack top is: 48
In tail call version, the stack top is: 96
In tail call version, the stack top is: 144
In tail call version, the stack top is: 192
120
In normal call version, the stack top is: 0
In normal call version, the stack top is: 48
In normal call version, the stack top is: 96
In normal call version, the stack top is: 144
In normal call version, the stack top is: 192
120
Run Code Online (Sandbox Code Playgroud)

两者中的堆栈差异为48,而尾部调用版本需要多一个int.(为什么?)
所以我认为优化可能很方便:

> g++ -O2 main.cpp -o TailCall
> ./TailCall
In tail call version, the stack top is: 0
In tail call version, the stack top is: 80
In tail call version, the stack top is: 160
In tail call version, the stack top is: 240
In tail call version, the stack top is: 320
120
In normal call version, the stack top is: 0
In normal call version, the stack top is: 64
In normal call version, the stack top is: 128
In normal call version, the stack top is: 192
In normal call version, the stack top is: 256
120
Run Code Online (Sandbox Code Playgroud)

在这两种情况下堆栈大小都增加了,虽然编译器可能认为我的CPU比我的内存慢(不管怎么说),我不知道为什么80字节对于一个简单的函数是必需的.(为什么?).
尾调用版本也比普通版本占用更多空间,如果int的大小为16字节,则它完全合乎逻辑.(不,我没有128位CPU).
现在想一想编译器没有尾调用的原因,我认为它可能是异常,因为它们紧紧依赖于堆栈.所以我尝试了没有例外:

> g++ -O2 -fno-exceptions main.cpp -o TailCall
> ./TailCall
In tail call version, the stack top is: 0
In tail call version, the stack top is: 64
In tail call version, the stack top is: 128
In tail call version, the stack top is: 192
In tail call version, the stack top is: 256
120
In normal call version, the stack top is: 0
In normal call version, the stack top is: 48
In normal call version, the stack top is: 96
In normal call version, the stack top is: 144
In normal call version, the stack top is: 192
120
Run Code Online (Sandbox Code Playgroud)

这将正常版本切换回非优化的堆栈大小,而优化的版本则有8个字节.仍然是一个int不是8个字节.
我以为我在C++中遗漏的东西需要堆栈安排所以我试过c:http://ideone.com/tJPpc
仍然没有尾调用,但堆栈要小得多(两个版本中每帧32位).然后我尝试了优化:

> gcc -O2 main.c -o TailCall
> ./TailCall
In tail call version, the stack top is: 0
In tail call version, the stack top is: 0
In tail call version, the stack top is: 0
In tail call version, the stack top is: 0
In tail call version, the stack top is: 0
120
In normal call version, the stack top is: 0
In normal call version, the stack top is: 0
In normal call version, the stack top is: 0
In normal call version, the stack top is: 0
In normal call version, the stack top is: 0
120
Run Code Online (Sandbox Code Playgroud)

不仅它的尾部调用优化了第一个,它还尾部调用优化了第二个!
为什么g ++没有尾部调用优化,而它在平台上清晰可用?有没有办法强迫它?

fja*_*don 16

因为您正在将临时std :: string对象传递给PrintStackTop(std :: string)函数.此对象在堆栈上分配,从而防止尾调用优化.

我修改了你的代码:

void PrintStackTopStr(char const*const type)
{
    int stack_top;
    if(st == 0) st = (size_t) &stack_top;
    cout << "In " << type << " call version, the stack top is: " << (st - (size_t) &stack_top) << endl;
}

int RealTailCallFactorial(int n, int a = 1)
{
    PrintStackTopStr("tail");
    if(n < 2)
        return a;
    return RealTailCallFactorial(n - 1, n * a);
}
Run Code Online (Sandbox Code Playgroud)

编译:g ++ -O2 -fno-exceptions -o tailcall tailcall.cpp

它现在使用尾调用优化.如果使用-S标志生成程序集,则可以看到它的运行情况:

L39:
        imull   %ebx, %esi
        subl    $1, %ebx
L38:
        movl    $LC2, (%esp)
        call    __Z16PrintStackTopStrPKc
        cmpl    $1, %ebx
        jg      L39
Run Code Online (Sandbox Code Playgroud)

你看到递归调用内联为循环(jg L39).

  • @KonradRudolph因为临时是一个具有析构函数的对象,并且析构函数必须在递归调用返回后运行,所以递归调用不再是尾调用. (5认同)
  • 我不明白为什么临时应该防止TCO. (2认同)