平均值取决于相对于第二变量的分箱

Question

平均值取决于相对于第二变量的分箱

我正在使用python/numpy.作为输入数据,我有大量的值对(x,y).我基本上想要绘制<y>(x),即y某个数据仓的平均值x.目前我使用一个普通的for循环来实现这一点,这非常慢.

# create example data
x = numpy.random.rand(1000)
y = numpy.random.rand(1000)
# set resolution
xbins = 100
# find x bins
H, xedges, yedges = numpy.histogram2d(x, y, bins=(xbins,xbins) )
# calculate mean and std of y for each x bin
mean = numpy.zeros(xbins)
std = numpy.zeros(xbins)
for i in numpy.arange(xbins):
    mean[i] = numpy.mean(y[ numpy.logical_and( x>=xedges[i], x<xedges[i+1] ) ])
    std[i]  = numpy.std (y[ numpy.logical_and( x>=xedges[i], x<xedges[i+1] ) ])

Run Code Online (Sandbox Code Playgroud)

有可能为它进行一种矢量化写作吗？

Answer 1

Jai*_*ime 14

你不必要地使事情复杂化.您需要知道的是,对于每个bin,in x,what n,sy以及该bin中sy2的y值的数量x,这些y值的总和以及它们的平方和.你可以得到这些:

>>> n, _ = np.histogram(x, bins=xbins)
>>> sy, _ = np.histogram(x, bins=xbins, weights=y)
>>> sy2, _ = np.histogram(x, bins=xbins, weights=y*y)

Run Code Online (Sandbox Code Playgroud)

从那些:

>>> mean = sy / n
>>> std = np.sqrt(sy2/n - mean*mean)

Run Code Online (Sandbox Code Playgroud)

@JakobS.没有人......直到第一次看到它! (2认同)

归档时间：	12 年，10 月前
查看次数：	2662 次
最近记录：	12 年，10 月前