当weights参数是整数时,如何从numpy.bincount获取整数数组

Question

当weights参数是整数时,如何从numpy.bincount获取整数数组

考虑numpy数组 a

a = np.array([1, 0, 2, 1, 1])

Run Code Online (Sandbox Code Playgroud)

如果我进行bin计数,我会得到整数

np.bincount(a)

array([1, 3, 1])

Run Code Online (Sandbox Code Playgroud)

但是,如果我添加权重来执行等效的bin计数

np.bincount(a, np.ones_like(a))

array([ 1.,  3.,  1.])

Run Code Online (Sandbox Code Playgroud)

价值相同但是float.操纵这些的最明智的方法是什么int？numpy为什么不假设与作为权重传递的dtype相同？

Answer 1

MSe*_*ert 3

为什么 numpy 不采用与作为权重传递的数据类型相同的数据类型？

原因有二：

对计数进行加权的方法有多种，可以将值乘以权重，也可以将值乘以权重除以权重之和。在后一种情况下，它将始终是双精度数（只是因为否则除法将不准确）。

根据我的经验，使用归一化权重（第二种情况）进行加权更为常见。因此，假设它们是浮点数实际上是合理的（而且绝对更快）。
溢出。计数不可能超过整数限制，因为数组不能有超过此限制的值（这就是原因，否则您无法索引数组）。但如果将其与权重相乘，就不难使计数“溢出”。

我想在这种情况下，可能是后者的原因。

不太可能有人会使用非常大的整数权重和大量重复值 - 但假设如果出现以下情况会发生什么：

import numpy as np

i = 10000000
np.bincount(np.ones(100000000, dtype=int), weights=np.ones(10000000, dtype=int)*1000000000000)

Run Code Online (Sandbox Code Playgroud)

会返回：

array([0, -8446744073709551616])

Run Code Online (Sandbox Code Playgroud)

而不是实际结果：

array([  0.00000000e+00,   1.00000000e+19])

Run Code Online (Sandbox Code Playgroud)

结合第一个原因以及将浮点数组转换为整数数组非常容易（我个人认为这很简单）的事实：

np.asarray(np.bincount(...), dtype=int)

Run Code Online (Sandbox Code Playgroud)

可能是float对加权的“实际”返回的 dtype进行的bincount。

“字面”原因：

numpy源实际上提到需要weights转换为double( float64)：

/*
 * arr_bincount is registered as bincount.
 *
 * bincount accepts one, two or three arguments. The first is an array of
 * non-negative integers The second, if present, is an array of weights,
 * which must be promotable to double. Call these arguments list and
 * weight. Both must be one-dimensional with len(weight) == len(list). If
 * weight is not present then bincount(list)[i] is the number of occurrences
 * of i in list.  If weight is present then bincount(self,list, weight)[i]
 * is the sum of all weight[j] where list [j] == i.  Self is not used.
 * The third argument, if present, is a minimum length desired for the
 * output array.
 */

Run Code Online (Sandbox Code Playgroud)

好吧，他们然后将其转换为函数中的 double 。这就是您获得浮动数据类型结果的“字面”原因。

归档时间：	8 年，7 月前
查看次数：	484 次
最近记录：	8 年，7 月前