对 dask 数组中的 n 个单独元素进行切片

Phi*_*nau 4 python arrays numpy dask

假设我有一个 3D dask 数组,代表整个美国的温度时间序列,[Time, Lat, Lon]。我想要获取 100 个不同位置的表格时间序列。使用 numpy 的花哨索引,这看起来像[:, [lat1, lat2...], [lon1, lon2...]]. Dask 数组尚不允许这种索引。考虑到这种限制,完成这项任务的最佳方法是什么?

jim*_*ist 6

使用vindex索引器。这仅接受逐点索引或完整切片:

In [1]: import dask.array as da

In [2]: import numpy as np

In [3]: x = np.arange(1000).reshape((10, 10, 10))

In [4]: dx = da.from_array(x, chunks=(5, 5, 5))

In [5]: xcoords = [1, 3, 5]

In [6]: ycoords = [2, 4, 6]

In [7]: x[:, xcoords, ycoords]
Out[7]:
array([[ 12,  34,  56],
       [112, 134, 156],
       [212, 234, 256],
       [312, 334, 356],
       [412, 434, 456],
       [512, 534, 556],
       [612, 634, 656],
       [712, 734, 756],
       [812, 834, 856],
       [912, 934, 956]])

In [8]: dx.vindex[:, xcoords, ycoords].compute()
Out[8]:
array([[ 12, 112, 212, 312, 412, 512, 612, 712, 812, 912],
       [ 34, 134, 234, 334, 434, 534, 634, 734, 834, 934],
       [ 56, 156, 256, 356, 456, 556, 656, 756, 856, 956]])
Run Code Online (Sandbox Code Playgroud)

一些注意事项:

  • 这在 numpy 数组中尚不可用,但已被提议。请参阅此处的提案。

  • 这与 numpy 花式索引不完全兼容,因为它将新轴始终放在前面。一个简单的方法transpose可以重新排列这些:

前任:

In [9]: dx.vindex[:, xcoords, ycoords].T.compute()
Out[9]:
array([[ 12,  34,  56],
       [112, 134, 156],
       [212, 234, 256],
       [312, 334, 356],
       [412, 434, 456],
       [512, 534, 556],
       [612, 634, 656],
       [712, 734, 756],
       [812, 834, 856],
       [912, 934, 956]])
Run Code Online (Sandbox Code Playgroud)