小编fou*_*ead的帖子

如何在我的 numpy 数组中找到 NaN/无穷大/对于 dtype('float64') 来说太大的值？

我正在尝试使用 scikit learn 拟合一个简单的机器学习模型。在这条线上：

clf.fit(features, labels)

Run Code Online (Sandbox Code Playgroud)

我得到一个熟悉的错误：

 Input contains NaN, infinity or a value too large for dtype('float64').

Run Code Online (Sandbox Code Playgroud)

每当我之前遇到过这种情况时，我的数据中就有 NaN 值。我已经确认数据中没有 NaN。.fit() 方法的两个输入（特征和标签）是 np 数组，但它们是从 Pandas 数据帧生成的。在拉出我打印的 NaN 值之前：

print(features_df[features_df.isnull().any(axis=1)])
print(labels_df[labels_df.isnull().any(axis=1)])

Run Code Online (Sandbox Code Playgroud)

这打印了空数据帧，所以我知道其中没有包含 NaN 值的行。我还在转换后检查了 numpy 数组的 NaN 值，甚至使用 np sum() 方法成功地对它们求和，因此传递到 fit 的特征或标签 np 数组中没有 NaN 值。

这意味着必须有无穷大的值或非常大的值，我觉得这两者都令人难以置信。有什么方法可以打印数据帧或 np 数组中的任何值：

are NaN, infinity or a value too large for dtype('float64')?

Run Code Online (Sandbox Code Playgroud)

我需要特别指出它们，因为我无法用肉眼找到它们并且没有 NaN 值。

python numpy nan pandas numpy-dtype

som*_*ode

2020 11-11

8
推荐指数

1
解决办法

6406
查看次数

如何匹配两个numpy数组中包含的值对

我有两组坐标,想要找出该coo组的哪些坐标与该组中的任何坐标相同targets.我想知道coo集合中的索引,这意味着我想得到一个索引或bool的列表.

import numpy as np

coo = np.array([[1,2],[1,6],[5,3],[3,6]]) # coordinates
targets = np.array([[5,3],[1,6]]) # coordinates of targets

print(np.isin(coo,targets))

[[ True False]
 [ True  True]
 [ True  True]
 [ True  True]]

Run Code Online (Sandbox Code Playgroud)

期望的结果将是以下两个之一:

[False True True False] # bool list
[1,2] # list of concerning indices

Run Code Online (Sandbox Code Playgroud)

我的问题是,......

np.isin没有 - axis属性,所以我可以使用axis=1.
甚至应用逻辑和输出的每一行将返回True最后一个元素,这是错误的.

我知道循环和条件,但我确信Python配备了更优雅的解决方案.

python numpy coordinates numpy-ndarray

Hei*_*ein

2019 02-26

4
推荐指数

1
解决办法

538
查看次数

使用 numpy array_split() 获得所需的不是子倍数的分割大小

我有一个大小为 268238 的字节数组。 (dtype="uint8") 如何将它们拆分为每个大小为 2211 的子数组？余数数组可以更小。

一般而言：出于某种原因，我尝试使用 numpy 将文件拆分为大小为 2211 字节的块。（附加信息：之后我想对数组中的所有这些 2211 个元素进行 base64_encode，但这仅用于您的附加信息）

# create an array to test the problem
import numpy as np
a = np.random.randint(255, size=268238).astype("uint8")
# check size and dtype.
a.size
a.dtype
# until now everything is fine
# now i want to split it in equal parts of 2211 elements
# last one may be smaller
# 
https://docs.scipy.org/doc/numpy/reference/generated/numpy.array_split.html
# just take the elements size now... 
(np.array_split(a, a.size // 2211))[0].size # <-- 2217... but …

Run Code Online (Sandbox Code Playgroud)

python numpy numpy-ndarray

MAM*_*AMU

2019 02-17

3
推荐指数

1
解决办法

1692
查看次数

仅使用闭包的热切插值的行为就像惰性插值一样？

作为学习 Groovy 的一部分，我正在尝试探索字符串插值提供的所有复杂的可能性。我的一个小实验给出的结果对我来说没有意义，现在我想知道我是否完全误解了 Groovy 中惰性插值和急切插值的基本概念。

这是我运行的代码：

def myVar1 = 3
// An eager interpolation containing just a closure.
def myStr = "${{->myVar1}}"
print ("Just after the creation of myStr\n")
print (myStr as String)
myVar1 += 1                                           // Bump up myVar1.
print ("\nJust after incrementing myVar1\n")
print (myStr as String)

Run Code Online (Sandbox Code Playgroud)

这是我得到的输出：

Just after the creation of myStr
3
Just after incrementing myVar1
4

Run Code Online (Sandbox Code Playgroud)

显然，闭包已被第二次调用。重新执行闭包的唯一方法是重新评估包含插值。但是，包含插值本身并不是一个闭包，尽管它包含一个闭包。那么，为什么要重新评估它呢？

groovy

fou*_*ead

2019 01-07

2
推荐指数

1
解决办法

809
查看次数

为什么二维数组与一维数组的numpy点积会产生一维数组？

我尝试运行如下代码：

>>> import numpy as np
>>> A = np.array([[1,2], [3,4], [5,6]])
>>> A.shape
(3, 2)
>>> B = np.array([7,8])
>>> B.shape
(2,)
>>> np.dot(A,B)
array([23, 53, 83])

Run Code Online (Sandbox Code Playgroud)

我认为的形状np.dot(A,B)应该是 (1,3) 而不是 (3,)。

矩阵返回的结果应该是：

数组([[23],[53],[83]])