Agu*_*guy 4 python statistics numpy python-3.x
这让我感到惊讶......为了说明我使用这个小代码来计算1M随机数的平均值和中位数:
import numpy as np
import statistics as st
import time
listofrandnum = np.random.rand(1000000,)
t = time.time()
print('mean is:', st.mean(listofrandnum))
print('time to calc mean:', time.time()-t)
print('\n')
t = time.time()
print('median is:', st.median(listofrandnum))
print('time to calc median:', time.time()-t)
Run Code Online (Sandbox Code Playgroud)
结果如下:
mean is: 0.499866595037
time to calc mean: 2.0767598152160645
median is: 0.499721597395
time to calc median: 0.9687695503234863
Run Code Online (Sandbox Code Playgroud)
我的问题:为什么平均值比中位数慢?中位数需要一些排序算法(即比较),而均值需要求和.总和是否比比较慢?
我将非常感谢您对此的见解.
statistics不是NumPy的一部分.它是一个Python标准库模块,具有相当不同的设计理念; 它可以不惜一切代价获得准确性,即使对于异常输入数据类型和极差条件输入也是如此.以statistics模块执行方式执行求和非常昂贵,而不是执行排序.
如果您想在NumPy数组上获得有效的均值或中位数,请使用NumPy例程:
numpy.mean(whatever)
numpy.median(whatever)
Run Code Online (Sandbox Code Playgroud)
如果你想看到statistics模块经过的简单工作所需的昂贵工作,你可以查看源代码:
def _sum(data, start=0):
"""_sum(data [, start]) -> (type, sum, count)
Return a high-precision sum of the given numeric data as a fraction,
together with the type to be converted to and the count of items.
If optional argument ``start`` is given, it is added to the total.
If ``data`` is empty, ``start`` (defaulting to 0) is returned.
Examples
--------
>>> _sum([3, 2.25, 4.5, -0.5, 1.0], 0.75)
(<class 'float'>, Fraction(11, 1), 5)
Some sources of round-off error will be avoided:
>>> _sum([1e50, 1, -1e50] * 1000) # Built-in sum returns zero.
(<class 'float'>, Fraction(1000, 1), 3000)
Fractions and Decimals are also supported:
>>> from fractions import Fraction as F
>>> _sum([F(2, 3), F(7, 5), F(1, 4), F(5, 6)])
(<class 'fractions.Fraction'>, Fraction(63, 20), 4)
>>> from decimal import Decimal as D
>>> data = [D("0.1375"), D("0.2108"), D("0.3061"), D("0.0419")]
>>> _sum(data)
(<class 'decimal.Decimal'>, Fraction(6963, 10000), 4)
Mixed types are currently treated as an error, except that int is
allowed.
"""
count = 0
n, d = _exact_ratio(start)
partials = {d: n}
partials_get = partials.get
T = _coerce(int, type(start))
for typ, values in groupby(data, type):
T = _coerce(T, typ) # or raise TypeError
for n,d in map(_exact_ratio, values):
count += 1
partials[d] = partials_get(d, 0) + n
if None in partials:
# The sum will be a NAN or INF. We can ignore all the finite
# partials, and just look at this special one.
total = partials[None]
assert not _isfinite(total)
else:
# Sum all the partial sums using builtin sum.
# FIXME is this faster if we sum them in order of the denominator?
total = sum(Fraction(n, d) for d, n in sorted(partials.items()))
return (T, total, count)
Run Code Online (Sandbox Code Playgroud)