NumPy 中如何处理大整数?

lap*_*ita 2 python numpy scientific-computing pandas data-science

我正在做一个数据分析项目,其中处理的数据非常大。我最初用纯 python 做了所有事情,但现在尝试用 numpy 和 pandas 来做。然而,我似乎遇到了障碍,因为不可能在 numpy 中处理大于 64 位的整数(如果我在 numpy 中使用 python 整数,它们的最大值为 9223372036854775807)。我是否完全抛弃 numpy 和 pandas 还是有办法将它们与 python 风格的任意大整数一起使用?我对性能受到影响没关系。

Mar*_*eda 5

默认情况下 numpy 将元素保留为数字数据类型。但您可以强制输入对象,如下所示

import numpy as np
x = np.array([10,20,30,40], dtype=object)
x_exp2 = 1000**x
print(x_exp2)
Run Code Online (Sandbox Code Playgroud)

输出是

[1000000000000000000000000000000
 1000000000000000000000000000000000000000000000000000000000000
 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000]
Run Code Online (Sandbox Code Playgroud)

缺点是执行速度慢很多。

稍后编辑以显示 np.sum() 有效。当然可能存在一些限制。

import numpy as np
x = np.array([10,20,30,40], dtype=object)
x_exp2 = 1000**x

print(x_exp2)
print(np.sum(x_exp2))
print(np.prod(x_exp2))
Run Code Online (Sandbox Code Playgroud)

输出是:

[1000000000000000000000000000000
 1000000000000000000000000000000000000000000000000000000000000
 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000]
1000000000000000000000000000001000000000000000000000000000001000000000000000000000000000001000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Run Code Online (Sandbox Code Playgroud)