python 留一估计

Question

python 留一估计

use*_*991 2 python numpy combinatorics scikit-learn

我想从某个向量中获取一个矩阵x=(x_1,x_2, ..., x_I)，其中该矩阵中的每一行 i 对应于x(i) := (x_1,...,x_{i-1},x_{i+1},...,x_I)。

我知道

from sklearn.cross_validation import LeaveOneOut
I = 30
myrowiterator = LeaveOneOut(I)
for eachrow, _ in myrowiterator:
    print(eachrow)    # prints [1,2,...,29]
                      #        [0,2,...,29] and so on ...

Run Code Online (Sandbox Code Playgroud)

提供一个例程来获取上述矩阵的每一行。但我宁愿直接一步获得矩阵，直接对该矩阵进行操作，而不是循环遍历它的行。这会节省我一些计算时间。

Answer 1

Jai*_*ime 6

由于您有 numpy 标签，因此以下内容有效：

>>> N = 5
>>> idx = np.arange(N)
>>> idx = idx[1:] - (idx[:, None] >= idx[1:])
>>> idx
array([[1, 2, 3, 4],
       [0, 2, 3, 4],
       [0, 1, 3, 4],
       [0, 1, 2, 4],
       [0, 1, 2, 3]])

Run Code Online (Sandbox Code Playgroud)

现在您可以使用它来索引任何其他数组：

>>> a = np.array(['a', 'b', 'c', 'd', 'e'])
>>> a[idx]
array([['b', 'c', 'd', 'e'],
       ['a', 'c', 'd', 'e'],
       ['a', 'b', 'd', 'e'],
       ['a', 'b', 'c', 'e'],
       ['a', 'b', 'c', 'd']],
      dtype='|S1')

Run Code Online (Sandbox Code Playgroud)

编辑正如@user3820991建议的那样，可以通过将其写为以下内容来使其不那么神秘：

>>> N = 5
>>> idx = np.arange(1, N) - np.tri(N, N-1, k=-1, dtype=bool)
>>> idx
array([[1, 2, 3, 4],
       [0, 2, 3, 4],
       [0, 1, 3, 4],
       [0, 1, 2, 4],
       [0, 1, 2, 3]])

Run Code Online (Sandbox Code Playgroud)

该函数np.tri实际上是这个答案第一个版本中神奇比较的高度优化版本，因为它使用尽可能小的 int 类型作为数组的大小，因为 numpy 中的比较是使用 SIMD 进行矢量化的，所以类型越小，操作速度越快。

归档时间：	11 年，4 月前
查看次数：	2021 次
最近记录：	11 年，3 月前