Ash*_*ppa 5 python performance
我的Python程序太慢了.因此,我对其进行了分析,发现大部分时间都花在计算两点之间距离的函数上(一个点是3个Python浮点数的列表):
def get_dist(pt0, pt1):
val = 0
for i in range(3):
val += (pt0[i] - pt1[i]) ** 2
val = math.sqrt(val)
return val
Run Code Online (Sandbox Code Playgroud)
为了分析为什么这个函数这么慢,我编写了两个测试程序:一个用Python编写,一个用C++编写类似的计算.他们计算了100万对点之间的距离.(Python和C++中的测试代码如下.)
Python计算需要2秒,而C++需要0.02秒.100倍的差异!
为什么Python代码比C++代码这么简单的数学计算要慢得多?如何加快速度以匹配C++性能?
用于测试的Python代码:
import math, random, time
num = 1000000
# Generate random points and numbers
pt_list = []
rand_list = []
for i in range(num):
pt = []
for j in range(3):
pt.append(random.random())
pt_list.append(pt)
rand_list.append(random.randint(0, num - 1))
# Compute
beg_time = time.clock()
dist = 0
for i in range(num):
pt0 = pt_list[i]
ri = rand_list[i]
pt1 = pt_list[ri]
val = 0
for j in range(3):
val += (pt0[j] - pt1[j]) ** 2
val = math.sqrt(val)
dist += val
end_time = time.clock()
elap_time = (end_time - beg_time)
print elap_time
print dist
Run Code Online (Sandbox Code Playgroud)
用于测试的C++代码:
#include <cstdlib>
#include <iostream>
#include <ctime>
#include <cmath>
struct Point
{
double v[3];
};
int num = 1000000;
int main()
{
// Allocate memory
Point** pt_list = new Point*[num];
int* rand_list = new int[num];
// Generate random points and numbers
for ( int i = 0; i < num; ++i )
{
Point* pt = new Point;
for ( int j = 0; j < 3; ++j )
{
const double r = (double) rand() / (double) RAND_MAX;
pt->v[j] = r;
}
pt_list[i] = pt;
rand_list[i] = rand() % num;
}
// Compute
clock_t beg_time = clock();
double dist = 0;
for ( int i = 0; i < num; ++i )
{
const Point* pt0 = pt_list[i];
int r = rand_list[i];
const Point* pt1 = pt_list[r];
double val = 0;
for ( int j = 0; j < 3; ++j )
{
const double d = pt0->v[j] - pt1->v[j];
val += ( d * d );
}
val = sqrt(val);
dist += val;
}
clock_t end_time = clock();
double sec_time = (end_time - beg_time) / (double) CLOCKS_PER_SEC;
std::cout << sec_time << std::endl;
std::cout << dist << std::endl;
return 0;
}
Run Code Online (Sandbox Code Playgroud)
一系列优化:
import math, random, time
num = 1000000
# Generate random points and numbers
# Change #1: Sometimes it's good not to have too much randomness.
# This is one of those cases.
# Changing the code shouldn't change the results.
# Using a fixed seed ensures that the changes are valid.
# The final 'print dist' should yield the same result regardless of optimizations.
# Note: There's nothing magical about this seed.
# I randomly picked a hash tag from a git log.
random.seed (0x7126434a2ea2a259e9f4196cbb343b1e6d4c2fc8)
pt_list = []
rand_list = []
for i in range(num):
pt = []
for j in range(3):
pt.append(random.random())
pt_list.append(pt)
# Change #2: rand_list is computed in a separate loop.
# This ensures that upcoming optimizations will get the same results as
# this unoptimized version.
for i in range(num):
rand_list.append(random.randint(0, num - 1))
# Compute
beg_time = time.clock()
dist = 0
for i in range(num):
pt0 = pt_list[i]
ri = rand_list[i]
pt1 = pt_list[ri]
val = 0
for j in range(3):
val += (pt0[j] - pt1[j]) ** 2
val = math.sqrt(val)
dist += val
end_time = time.clock()
elap_time = (end_time - beg_time)
print elap_time
print dist
Run Code Online (Sandbox Code Playgroud)
第一个优化(未示出)是嵌入除import函数之外的所有代码.这个简单的更改为我的计算机提供了36%的性能提升.
**操作员.你没有pow(d,2)在你的C代码中使用,因为每个人都知道这在C中是不理想的.它在python中也是次优的.Python **很聪明; 它评估x**2为x*x.但是,聪明需要时间.你知道你想要的d*d,所以要用它.这是使用该优化的计算循环:
for i in range(num):
pt0 = pt_list[i]
ri = rand_list[i]
pt1 = pt_list[ri]
val = 0
for j in range(3):
d = pt0[j] - pt1[j]
val += d*d
val = math.sqrt(val)
dist += val
Run Code Online (Sandbox Code Playgroud)
你的Python代码看起来很像你的C代码.你没有利用这种语言.
import math, random, time, itertools
def main (num=1000000) :
# This small optimization speeds things up by a couple percent.
sqrt = math.sqrt
# Generate random points and numbers
random.seed (0x7126434a2ea2a259e9f4196cbb343b1e6d4c2fc8)
def random_point () :
return [random.random(), random.random(), random.random()]
def random_index () :
return random.randint(0, num-1)
# Big optimization:
# Don't generate the lists of points.
# Instead use list comprehensions that create iterators.
# It's best to avoid creating lists of millions of entities when you don't
# need those lists. You don't need the lists; you just need the iterators.
pt_list = [random_point() for i in xrange(num)]
rand_pts = [pt_list[random_index()] for i in xrange(num)]
# Compute
beg_time = time.clock()
dist = 0
# Don't loop over a range. That's too C-like.
# Instead loop over some iterable, preferably one that doesn't create the
# collection over which the iteration is to occur.
# This is particularly important when the collection is large.
for (pt0, pt1) in itertools.izip (pt_list, rand_pts) :
# Small optimization: inner loop inlined,
# intermediate variable 'val' eliminated.
d0 = pt0[0]-pt1[0]
d1 = pt0[1]-pt1[1]
d2 = pt0[2]-pt1[2]
dist += sqrt(d0*d0 + d1*d1 + d2*d2)
end_time = time.clock()
elap_time = (end_time - beg_time)
print elap_time
print dist
Run Code Online (Sandbox Code Playgroud)
以下大约需要原始版本在代码的定时部分中的1/40.不如C快,但接近.
请注意注释掉的"Mondo slow"计算.这大约是原始版本的十倍.使用numpy需要管理费用.与我的非numpy优化#3相比,下面的代码中的设置需要相当长的时间.
结论:使用numpy时需要注意,设置成本可能很高.
import numpy, random, time
def main (num=1000000) :
# Generate random points and numbers
random.seed (0x7126434a2ea2a259e9f4196cbb343b1e6d4c2fc8)
def random_point () :
return [random.random(), random.random(), random.random()]
def random_index () :
return random.randint(0, num-1)
pt_list = numpy.array([random_point() for i in xrange(num)])
rand_pts = pt_list[[random_index() for i in xrange(num)],:]
# Compute
beg_time = time.clock()
# Mondo slow.
# dist = numpy.sum (
# numpy.apply_along_axis (
# numpy.linalg.norm, 1, pt_list - rand_pts))
# Mondo fast.
dist = numpy.sum ((numpy.sum ((pt_list-rand_pts)**2, axis=1))**0.5)
end_time = time.clock()
elap_time = (end_time - beg_time)
print elap_time
print dist
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1271 次 |
| 最近记录: |