如何有效地将 4D numpy 数组转换为以索引作为列的 pandas DataFrame?

Eri*_*ang 2 python arrays numpy python-2.7 pandas

我有一个形状为 (4, 155, 240, 240) 的 4D numpy 数组。我想创建一个 pandas DataFrame,其中该数组的每个元素一行,五列:一列对应四个索引中的每一个,一列对应数组中的值。我现在使用的代码如下所示:

import pandas as pd
import numpy as np

# some array of this shape
im = np.zeros((4, 155, 240, 240))

df = {col: [] for col in ['mode', 'x', 'y', 'z', 'val']}
for idx, val in np.ndenumerate(im):
    df['mode'].append(idx[0])
    df['y'].append(idx[1])
    df['x'].append(idx[2])
    df['z'].append(idx[3])
    df['val'].append(val)
df = pd.DataFrame(df)
Run Code Online (Sandbox Code Playgroud)

有没有一种方法可以更有效地做到这一点,可能使用矢量化运算?

Psi*_*dom 5

似乎您需要元素的索引,您可以尝试numpy.meshgrid

arr = np.column_stack(list(map(np.ravel, np.meshgrid(*map(np.arange, im.shape), indexing="ij"))) + [im.ravel()])

arr
#array([[   0.,    0.,    0.,    0.,    0.],
#       [   0.,    0.,    0.,    1.,    0.],
#       [   0.,    0.,    0.,    2.,    0.],
#       ..., 
#       [   3.,  154.,  239.,  237.,    0.],
#       [   3.,  154.,  239.,  238.,    0.],
#       [   3.,  154.,  239.,  239.,    0.]])
Run Code Online (Sandbox Code Playgroud)

然后从中构造一个数据框:

pd.DataFrame(arr, columns = ['mode', 'x', 'y', 'z', 'val'])
Run Code Online (Sandbox Code Playgroud)

与普通 for 循环的时序比较pd.ndenumerate

mesh = pd.DataFrame(np.column_stack(list(map(np.ravel, np.meshgrid(*map(np.arange, im.shape), indexing="ij"))) + [im.ravel()]),
                   columns=["mode", "x", "y", "z", "val"])

loop = pd.DataFrame([index + (x,) for index, x in np.ndenumerate(im)], columns=["mode", "x", "y", "z", "val"])

(loop.values == mesh.values).all()
# True

%timeit mesh = pd.DataFrame(np.column_stack(list(map(np.ravel, np.meshgrid(*map(np.arange, im.shape), indexing="ij"))) + [im.ravel()]), columns=["mode", "x", "y", "z", "val"])
# 1 loop, best of 3: 2.07 s per loop

%timeit loop = pd.DataFrame([index + (x,) for index, x in np.ndenumerate(im)], columns=["mode", "x", "y", "z", "val"])
# 1 loop, best of 3: 1min 2s per loop
Run Code Online (Sandbox Code Playgroud)