Python和C++之间不同寻常的速度差异

Kee*_*Jay 37 c++ python algorithm performance

我最近写了一个简短的算法来计算python中的快乐数字.该程序允许您选择一个上限,它将确定它下面的所有快乐数字.为了进行速度比较,我决定将我所知道的算法从python直接翻译成c ++.

令人惊讶的是,c ++版本的运行速度明显慢于python版本.在发现前10,000个满意数字的执行时间之间进行准确的速度测试表明python程序平均在0.59秒内运行,而c ++版本平均在8.5秒内运行.

我将这个速度差异归结为我必须在c ++版本中编写部分计算的辅助函数(例如,确定元素是否在列表/数组/向量中),这些函数已经内置到python语言中.

首先,这是否是这种荒谬的速度差异的真正原因,其次,如何更改c ++版本以比python版本更快地执行(在我看来应该如此).

带有速度测试的两段代码在这里:Python Version,C++ Version.谢谢您的帮助.

#include <iostream>
#include <vector>
#include <string>
#include <ctime>
#include <windows.h>

using namespace std;

bool inVector(int inQuestion, vector<int> known);
int sum(vector<int> given);
int pow(int given, int power);
void calcMain(int upperBound);

int main()
{
    while(true)
    {
        int upperBound;
        cout << "Pick an upper bound: ";
        cin >> upperBound;
        long start, end;
        start = GetTickCount();
        calcMain(upperBound);
        end = GetTickCount();
        double seconds = (double)(end-start) / 1000.0;
        cout << seconds << " seconds." << endl << endl;
    }
    return 0;
}

void calcMain(int upperBound)
{
    vector<int> known;
    for(int i = 0; i <= upperBound; i++)
    {
        bool next = false;
        int current = i;
        vector<int> history;
        while(!next)
        {
            char* buffer = new char[10];
            itoa(current, buffer, 10);
            string digits = buffer;
            delete buffer;
            vector<int> squares;
            for(int j = 0; j < digits.size(); j++)
            {
                char charDigit = digits[j];
                int digit = atoi(&charDigit);
                int square = pow(digit, 2);
                squares.push_back(square);
            }
            int squaresum = sum(squares);
            current = squaresum;
            if(inVector(current, history))
            {
                next = true;
                if(current == 1)
                {
                    known.push_back(i);
                    //cout << i << "\t";
                }
            }
            history.push_back(current);
        }
    }
    //cout << "\n\n";
}

bool inVector(int inQuestion, vector<int> known)
{
    for(vector<int>::iterator it = known.begin(); it != known.end(); it++)
        if(*it == inQuestion)
            return true;
    return false;
}

int sum(vector<int> given)
{
    int sum = 0;
    for(vector<int>::iterator it = given.begin(); it != given.end(); it++)
        sum += *it;
    return sum;
}

int pow(int given, int power)
{
    int original = given;
    int current = given;
    for(int i = 0; i < power-1; i++)
        current *= original;
    return current;
}
Run Code Online (Sandbox Code Playgroud)
#!/usr/bin/env python

import timeit

upperBound = 0

def calcMain():
    known = []
    for i in range(0,upperBound+1):
        next = False
        current = i
        history = []
        while not next:
            digits = str(current)
            squares = [pow(int(digit), 2) for digit in digits]
            squaresum = sum(squares)
            current = squaresum
            if current in history:
                next = True
                if current == 1:
                    known.append(i)
                    ##print i, "\t",
            history.append(current)
    ##print "\nend"

while True:    
    upperBound = input("Pick an upper bound: ")
    result = timeit.Timer(calcMain).timeit(1)
    print result, "seconds.\n"
Run Code Online (Sandbox Code Playgroud)

Asi*_*sik 154

对于100000个元素,Python代码需要6.9秒,而C++最初需要37秒.

我对您的代码进行了一些基本的优化,并设法使C++代码比Python实现快100倍.它现在在0.06秒内完成100000个元素.这比原始C++代码快617倍.

最重要的是在发布模式下进行编译,并进行所有优化.在调试模式下,此代码实际上要慢一个数量级.

接下来,我将解释我所做的优化.

  • 将所有向量声明移到循环之外; 用clear()操作替换它们,这比调用构造函数要快得多.
  • 用乘法:value*值替换对pow(value,2)的调用.
  • 我没有使用方形向量和调用求和,而是仅使用整数就地求值.
  • 避免所有字符串操作,与整数操作相比,这些操作非常慢.例如,可以通过重复除以10并获取结果值的模数10来计算每个数字的平方,而不是将该值转换为字符串,然后将每个字符转换回int.
  • 避免所有矢量副本,首先通过传递引用替换传递值,最后完全消除辅助函数.
  • 消除了一些临时变量.
  • 可能很多细节我都忘记了.比较你的代码和我的并排,看看我做了什么.

通过使用预先分配的数组而不是向量,可以更好地优化代码,但这将是更多的工作,我将把它作为练习留给读者.:P

这是优化的代码:

#include <iostream>
#include <vector>
#include <string>
#include <ctime>
#include <algorithm>
#include <windows.h>

using namespace std;

void calcMain(int upperBound, vector<int>& known);

int main()
{
    while(true)
    {
        vector<int> results;
        int upperBound;
        cout << "Pick an upper bound: ";
        cin >> upperBound;
        long start, end;
        start = GetTickCount();
        calcMain(upperBound, results);
        end = GetTickCount();
        for (size_t i = 0; i < results.size(); ++i) {
            cout << results[i] << ", ";
        }
        cout << endl;
        double seconds = (double)(end-start) / 1000.0;
        cout << seconds << " seconds." << endl << endl;
    }
    return 0;
}

void calcMain(int upperBound, vector<int>& known)
{
    vector<int> history;
    for(int i = 0; i <= upperBound; i++)
    {
        int current = i;
        history.clear();
        while(true)
        {
                int temp = current;
                int sum = 0;
                while (temp > 0) {
                    sum += (temp % 10) * (temp % 10);
                    temp /= 10;
                }
                current = sum;
                if(find(history.begin(), history.end(), current) != history.end())
                {
                        if(current == 1)
                        {
                                known.push_back(i);
                        }
                        break;
                }
                history.push_back(current);
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

  • 谢谢!你是我对SO的第一个积极评论.:) (8认同)
  • 请停止发送垃圾邮件.如果人们阅读了你的答案,他们就会赞成.不要把它推到喉咙里. (5认同)
  • @toto:从什么时候向量和迭代器变得低效? (5认同)
  • 用注释解释这些变化可能会更好,所以更清楚的是发生了什么. (2认同)

ily*_* n. 20

有一个新的,速度更快的版本作为单独的答案,所以这个答案已被弃用.


我重写了你的算法,只要它找到快乐或不快乐的数字就进行缓存.我也试图尽可能地使它成为pythonic,例如通过创建单独的函数digits()happy().很抱歉使用Python 3,但我也可以从中展示一些有用的东西.

这个版本要快得多.它运行在1.7S这是10倍比你原来的程序,需要更快的18S(好,我的MacBook是相当老又慢:))

#!/usr/bin/env python3

from timeit import Timer
from itertools import count

print_numbers = False
upperBound = 10**5  # Default value, can be overidden by user.


def digits(x:'nonnegative number') -> "yields number's digits":
    if not (x >= 0): raise ValueError('Number should be nonnegative')
    while x:
        yield x % 10
        x //= 10


def happy(number, known = {1}, happies = {1}) -> 'True/None':
    '''This function tells if the number is happy or not, caching results.

    It uses two static variables, parameters known and happies; the
    first one contains known happy and unhappy numbers; the second 
    contains only happy ones.

    If you want, you can pass your own known and happies arguments. If
    you do, you should keep the assumption commented out on the 1 line.

    '''

#        assert 1 in known and happies <= known  # <= is expensive

    if number in known:
        return number in happies

    history = set()
    while True:
        history.add(number)
        number = sum(x**2 for x in digits(number))
        if number in known or number in history:
            break

    known.update(history)
    if number in happies:
        happies.update(history)
        return True


def calcMain():
    happies = {x for x in range(upperBound) if happy(x) }
    if print_numbers:
        print(happies)


if __name__ == '__main__':
    upperBound = eval(
            input("Pick an upper bound [default {0}]: "
                    .format(upperBound)).strip()
            or repr(upperBound))
    result = Timer(calcMain).timeit(1)
    print ('This computation took {0} seconds'.format(result))
Run Code Online (Sandbox Code Playgroud)


Joh*_*ohn 16

看起来你正在按值传递向量到其他函数.这将是一个显着的减速,因为程序实际上会在向量传递给函数之前制作完整的矢量副本.要解决此问题,请将常量引用传递给向量而不是副本.所以代替:

int sum(vector<int> given)
Run Code Online (Sandbox Code Playgroud)

使用:

int sum(const vector<int>& given)
Run Code Online (Sandbox Code Playgroud)

执行此操作时,您将无法再使用vector :: iterator,因为它不是常量.你需要用vector :: const_iterator替换它.

您也可以传入非常量引用,但在这种情况下,您根本不需要修改参数.

  • 可能......这意味着它对性能比较毫无用处.Python可能比C++更好,这并不奇怪. (2认同)

Edi*_*enz 7

我可以看到你有很多不必要的堆分配

例如:

while(!next)
        {
            char* buffer = new char[10];
Run Code Online (Sandbox Code Playgroud)

这看起来不是很优化.因此,您可能希望预先分配数组并在循环中使用它.这是一种易于发现和完成的基本优化技术.它也可能变得一团糟,所以要小心.

你也在使用atoi()函数,我真的不知道它是否真的被优化了.也许做模数10并得到数字可能会更好(你必须衡量你,我没有测试这个).

您进行线性搜索(inVector)的事实可能不好.用std :: set替换矢量数据结构可能会加快速度.hash_set也可以做到这一点.

但我认为最糟糕的问题是字符串和这个循环中堆上的东西的分配.这看起来不太好.我会先尝试那些地方.

  • 对于它的价值,如果你每次都要分配一个10字节的数组,为什么不把它声明为一个10字节的数组呢? (4认同)

ily*_* n. 6

这是我的第二个答案; 它可以缓存诸如平方和之类的<= 10**6:

        happy_list[sq_list[x%happy_base] + sq_list[x//happy_base]]
Run Code Online (Sandbox Code Playgroud)

那是,

  • 号码分为3位+ 3位数
  • 预先计算的表来获得平方和更换零件
  • 这两个结果被添加
  • 咨询预先计算的表以获得数字的幸福:

我不认为Python版本可以比这更快(好吧,如果你扔掉旧版本,这是try:开销,它快10%).

我认为这是一个很好的问题,它表明,事实上,

  • 必须快速的事情应该用C语写
  • 但是,通常你不需要快速的东西(即使你需要程序运行一天,它会少于程序员优化它的总时间)
  • 用Python编写程序更容易,更快捷
  • 但对于某些问题,尤其是计算问题,C++解决方案(如上所述)实际上比尝试优化Python程序更具可读性和美观性.

好的,在这里(第2版现在......):

#!/usr/bin/env python3
'''Provides slower and faster versions of a function to compute happy numbers.

slow_happy() implements the algorithm as in the definition of happy
numbers (but also caches the results).

happy() uses the precomputed lists of sums of squares and happy numbers
to return result in just 3 list lookups and 3 arithmetic operations for
numbers less than 10**6; it falls back to slow_happy() for big numbers.

Utilities: digits() generator, my_timeit() context manager.

'''


from time import time  # For my_timeit.
from random import randint # For example with random number.

upperBound = 10**5  # Default value, can be overridden by user.


class my_timeit:
    '''Very simple timing context manager.'''

    def __init__(self, message):
        self.message = message
        self.start = time()

    def __enter__(self):
        return self

    def __exit__(self, *data):
        print(self.message.format(time() - self.start))


def digits(x:'nonnegative number') -> "yields number's digits":
    if not (x >= 0): raise ValueError('Number should be nonnegative')
    while x:
        yield x % 10
        x //= 10


def slow_happy(number, known = {1}, happies = {1}) -> 'True/None':
    '''Tell if the number is happy or not, caching results.

    It uses two static variables, parameters known and happies; the
    first one contains known happy and unhappy numbers; the second 
    contains only happy ones.

    If you want, you can pass your own known and happies arguments. If
    you do, you should keep the assumption commented out on the 1 line.

    '''
    # This is commented out because <= is expensive.
    # assert {1} <= happies <= known 

    if number in known:
        return number in happies

    history = set()
    while True:
        history.add(number)
        number = sum(x**2 for x in digits(number))
        if number in known or number in history:
            break

    known.update(history)
    if number in happies:
        happies.update(history)
        return True


# This will define new happy() to be much faster ------------------------.

with my_timeit('Preparation time was {0} seconds.\n'):

    LogAbsoluteUpperBound = 6 # The maximum possible number is 10**this.
    happy_list = [slow_happy(x)
                  for x in range(81*LogAbsoluteUpperBound + 1)]
    happy_base = 10**((LogAbsoluteUpperBound + 1)//2)
    sq_list = [sum(d**2 for d in digits(x))
               for x in range(happy_base + 1)]

    def happy(x):
        '''Tell if the number is happy, optimized for smaller numbers.

        This function works fast for numbers <= 10**LogAbsoluteUpperBound.

        '''
        try:
            return happy_list[sq_list[x%happy_base] + sq_list[x//happy_base]]
        except IndexError:
            return slow_happy(x)

# End of happy()'s redefinition -----------------------------------------.


def calcMain(print_numbers, upper_bound):
    happies = [x for x in range(upper_bound + 1) if happy(x)]
    if print_numbers:
        print(happies)


if __name__ == '__main__':
    while True:

        upperBound = eval(input(
            "Pick an upper bound [{0} default, 0 ends, negative number prints]: "
            .format(upperBound)).strip() or repr(upperBound))
        if not upperBound:
            break

        with my_timeit('This computation took {0} seconds.'):
            calcMain(upperBound < 0, abs(upperBound))

        single = 0
        while not happy(single):
            single = randint(1, 10**12)
        print('FYI, {0} is {1}.\n'.format(single,
                    'happy' if happy(single) else 'unhappy')) 

    print('Nice to see you, goodbye!')
Run Code Online (Sandbox Code Playgroud)