优化for循环的函数调用

Ada*_*ora 5 c++ optimization for-loop function

我有一些简单的功能

int f_0(int);
int f_1(int);
...
int f_n(int);
Run Code Online (Sandbox Code Playgroud)

然后我有一些for循环,我调用f_i(),这个循环中的条件不必相同

for (int i = 0; i < n; i++) {
   ...
   if (condition) {
      int myInt = f_i(); // this is not real implementation but shows the result
                         // I want to achieve
      ... //edit
   }
...
}
Run Code Online (Sandbox Code Playgroud)

以下是我尝试实现此方法的方法:

  • 分解for循环并在相应的部分中调用每个函数.这导致最快的代码,但这是非常不优雅的,并且这样的代码很难进一步开发.
  • 指向功能的指针

    typedef int (*Foo) (int);

    Foo fptr[] = { f_0, f_1, ... , f_n };

这是一种优雅的方法,但在我的情况下,它比分解循环慢4.4.函数的常量指针产生类似的结果.

  • 将我的功能封装到开关功能中.这比打破循环慢2.6.

有没有更好的方法来实现这个?理想的解决方案是具有紧凑代码的解决方案,但编译器会分解循环并让计算最快.

我正在使用MSVC 2012并在发布模式下运行,并将优化设置为最大化速度.

编辑:

这是我的测试代码:

head.h

namespace c {
const int w = 1024;
const int A = w * w;
}

inline int f_0(int pos)  { return (pos - c::w + c::A) % c::A;           }
inline int f_1(int pos)  { return (pos + 1 - c::w + c::A) % c::A;       }
inline int f_2(int pos)  { return (pos + 1) % c::A;                     }
inline int f_3(int pos)  { return (pos + c::w) % c::A;                  }
inline int f_4(int pos)  { return (pos - 1 + c::w) % c::A;              }
inline int f_5(int pos)  { return (pos - 1 + c::A) % c::A;              }

typedef int (*NEIGH_F) (int);
typedef int (* const CNEIGH_F) (int);

const NEIGH_F  fptr[]  = { f_0, f_1, f_2, f_3, f_4, f_5 };
const CNEIGH_F cfptr[] = { f_0, f_1, f_2, f_3, f_4, f_5 };

inline int fswitch(int i, int pos) {
    switch(i) {
    case 0 : return f_0(pos); break;
    case 1 : return f_1(pos); break;
    case 2 : return f_2(pos); break;
    case 3 : return f_3(pos); break;
    case 4 : return f_4(pos); break;
    case 5 : return f_5(pos); break;
    default : return -1; break;
    }
}
Run Code Online (Sandbox Code Playgroud)

main.cpp中

#include "head.h"
#include <iostream>
#include <time.h>

int main()
{
    int maxRepeat = 100;

    clock_t startTime = clock();
    double sum = 0;
    for (int repeat = 0; repeat < maxRepeat; repeat++)
        for (int i = 0; i < c::A; i++) {
            sum += f_0(i);
            sum += f_1(i);
            sum += f_2(i);
            sum += f_3(i);
            sum += f_4(i);
            sum += f_5(i);
        }
    std::cout << "normal time:        " << (clock() - startTime)/(double)CLOCKS_PER_SEC
                 << "  sum is: " << sum << std::endl;

    startTime = clock();
    sum = 0;
    for (int repeat = 0; repeat < maxRepeat; repeat++)
        for (int i = 0; i < c::A; i++) {
            for (int j = 0; j < 6; j++)
                sum += fptr[j](i);
        }
    std::cout << "pointer time:       " << (clock() - startTime)/(double)CLOCKS_PER_SEC
                 << "  sum is: " << sum << std::endl;

    startTime = clock();
    sum = 0;
    for (int repeat = 0; repeat < maxRepeat; repeat++)
        for (int i = 0; i < c::A; i++) {
            for (int j = 0; j < 6; j++)
                sum += cfptr[j](i);
        }
    std::cout << "const pointer time: " << (clock() - startTime)/(double)CLOCKS_PER_SEC
                 << "  sum is: " << sum << std::endl;

    startTime = clock();
    sum = 0;
    for (int repeat = 0; repeat < maxRepeat; repeat++)
        for (int i = 0; i < c::A; i++) {
            for (int j = 0; j < 6; j++)
                sum += fswitch(j, i);
        }
    std::cout << "switch time:        " << (clock() - startTime)/(double)CLOCKS_PER_SEC
                 << "  sum is: " << sum << std::endl;
    std::cin.ignore();

    return 0;
}
Run Code Online (Sandbox Code Playgroud)

功能f_i是我在实际执行使用的功能,但这里的循环是简单得多,由于测试目的,在实际执行中存在的问题第二个代码片段所示形式的几种不同的循环.

EDIT2:

我的环路的形式应保持不变,我只是想找到如何把f_i到我的环路的最佳方式.

Iwi*_*ist 1

函数f_i()Aw常数真的是给定的吗?因为如果是的话,这个问题不是可以简单地简化为表查找、加法和按位与吗?

/* Includes */
#include <stdio.h>
#include <time.h>


/* Constants */
const int w = 1024;
const int A = 1024*1024;
const int addconst[6] = {0xFFC00, 0xFFC01, 0x00001, 0x00400, 0x003FF, 0xFFFFF};
                      /*     A-w,   A-w+1,       1,       w,     w-1,     A-1 */

/* THE NOVELTY */
int ftable(int i, int pos){
    return (pos + addconst[i]) & 0xFFFFF;
}

/* Main */
int main(int argc, char* argv[]){
    clock_t timeTaken;
    int     repeat, maxRepeat = 100;
    int     i, j;
    long    sum = 0;

    timeTaken  = -clock();
    for(repeat=0;repeat<maxRepeat;repeat++)
        for(i=0;i<A;i++)
            for(j=0;j<6;j++)
                sum += ftable(j, i);
    timeTaken += clock();

    printf("Stop! Hammertime!        %f  sum is: %f\n",
           timeTaken/(double)CLOCKS_PER_SEC, (double)sum);
    return 0;
}
Run Code Online (Sandbox Code Playgroud)

请注意,当sum变量为 a时long,所花费的时间为:

Stop! Hammertime!        0.348295  sum is: 329853173760000.000000
Run Code Online (Sandbox Code Playgroud)

而当它是 a 时double,则需要两倍以上的时间:

Stop! Hammertime!        0.861563  sum is: 329853173760000.000000
Run Code Online (Sandbox Code Playgroud)

我的编译标志是:

gcc -O3 -funroll-loops -finline-functions tmp.c -o tmp
Run Code Online (Sandbox Code Playgroud)

如果您可以更多地解释函数索引如何依赖于循环索引,我可以进一步优化。