使用不同方法运行排列函数的速度会导致意外结果

Arm*_*enB 4 c++ optimization performance permutation performance-testing

我已经实现了一个isPermutation函数,给定两个字符串将返回,true如果这两个是彼此的排列,否则它将返回false.

一个使用c ++排序算法两次,而另一个使用一个int数组来跟踪字符串计数.

我运行代码几次,每次排序方法更快.我的阵列实现错了吗?

这是输出:

1
0
1
Time: 0.088 ms
1
0
1
Time: 0.014 ms
Run Code Online (Sandbox Code Playgroud)

和代码:

#include <iostream> // cout
#include <string>   // string
#include <cstring> // memset
#include <algorithm> // sort
#include <ctime> // clock_t

using namespace std;

#define MAX_CHAR 255


void PrintTimeDiff(clock_t start, clock_t end) {
    std::cout << "Time: " << (end - start) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;
}


// using array to keep a count of used chars
bool isPermutation(string inputa, string inputb) {
    int allChars[MAX_CHAR];
    memset(allChars, 0, sizeof(int) * MAX_CHAR);

    for(int i=0; i < inputa.size(); i++) {
        allChars[(int)inputa[i]]++;
    }

    for (int i=0; i < inputb.size(); i++) {
        allChars[(int)inputb[i]]--;
        if(allChars[(int)inputb[i]] < 0) {
            return false;
        }   
    }

    return true;
}


// using sorting anc comparing
bool isPermutation_sort(string inputa, string inputb) {

    std::sort(inputa.begin(), inputa.end());
    std::sort(inputb.begin(), inputb.end());

    if(inputa == inputb) return true;
    return false;
}



int main(int argc, char* argv[]) {

    clock_t  start = clock();
    cout << isPermutation("god", "dog") << endl;
    cout << isPermutation("thisisaratherlongerinput","thisisarathershorterinput") << endl;
    cout << isPermutation("armen", "ramen") << endl;
    PrintTimeDiff(start, clock());


    start = clock();
    cout << isPermutation_sort("god", "dog") << endl;
    cout << isPermutation_sort("thisisaratherlongerinput","thisisarathershorterinput") << endl;
    cout << isPermutation_sort("armen", "ramen") << endl;
    PrintTimeDiff(start, clock());

    return 0;
}
Run Code Online (Sandbox Code Playgroud)

Hal*_*Hal 5

要对此进行基准测试,您必须消除所有噪音.最简单的方法是将它包装在循环中,每次重复调用1000次左右,然后每10次迭代只吐出一次值.这样他们每个人都有类似的缓存配置文件.丢弃伪造的值(例如,由于操作系统的上下文切换导致的井喷).

通过这样做,我的方法稍快一点.摘录.

method 1 array Time: 0.768 us
method 2 sort Time: 0.840333 us

method 1 array Time: 0.621333 us
method 2 sort Time: 0.774 us

method 1 array Time: 0.769 us
method 2 sort Time: 0.856333 us

method 1 array Time: 0.766 us
method 2 sort Time: 0.850333 us

method 1 array Time: 0.802667 us
method 2 sort Time: 0.89 us

method 1 array Time: 0.778 us
method 2 sort Time: 0.841333 us
Run Code Online (Sandbox Code Playgroud)

我使用的rdtsc在这个系统上对我来说效果更好.每微秒3000个循环足够接近这个,但如果你关心读数的精确度,请确保它更准确.

#if defined(__x86_64__)
static uint64_t rdtsc()
{
    uint64_t    hi, lo;

    __asm__ __volatile__ (
                            "xor %%eax, %%eax\n"
                            "cpuid\n"
                            "rdtsc\n"
                            : "=a"(lo), "=d"(hi)
                            :: "ebx", "ecx");

    return (hi << 32)|lo;
}
#else
#error wrong architecture - implement me
#endif

void PrintTimeDiff(uint64_t start, uint64_t end) {
    std::cout << "Time: " << (end - start)/double(3000)  << " us" << std::endl;
}
Run Code Online (Sandbox Code Playgroud)

  • 每当我看到这样的东西时,我都会为那些为他们的魔法编写这种代码的人们表示敬意. (2认同)