Eri*_*ang 2 python arrays numpy python-2.7 pandas
我有一个形状为 (4, 155, 240, 240) 的 4D numpy 数组。我想创建一个 pandas DataFrame,其中该数组的每个元素一行,五列:一列对应四个索引中的每一个,一列对应数组中的值。我现在使用的代码如下所示:
import pandas as pd
import numpy as np
# some array of this shape
im = np.zeros((4, 155, 240, 240))
df = {col: [] for col in ['mode', 'x', 'y', 'z', 'val']}
for idx, val in np.ndenumerate(im):
df['mode'].append(idx[0])
df['y'].append(idx[1])
df['x'].append(idx[2])
df['z'].append(idx[3])
df['val'].append(val)
df = pd.DataFrame(df)
Run Code Online (Sandbox Code Playgroud)
有没有一种方法可以更有效地做到这一点,可能使用矢量化运算?
似乎您需要元素的索引,您可以尝试numpy.meshgrid:
arr = np.column_stack(list(map(np.ravel, np.meshgrid(*map(np.arange, im.shape), indexing="ij"))) + [im.ravel()])
arr
#array([[ 0., 0., 0., 0., 0.],
# [ 0., 0., 0., 1., 0.],
# [ 0., 0., 0., 2., 0.],
# ...,
# [ 3., 154., 239., 237., 0.],
# [ 3., 154., 239., 238., 0.],
# [ 3., 154., 239., 239., 0.]])
Run Code Online (Sandbox Code Playgroud)
然后从中构造一个数据框:
pd.DataFrame(arr, columns = ['mode', 'x', 'y', 'z', 'val'])
Run Code Online (Sandbox Code Playgroud)
与普通 for 循环的时序比较pd.ndenumerate:
mesh = pd.DataFrame(np.column_stack(list(map(np.ravel, np.meshgrid(*map(np.arange, im.shape), indexing="ij"))) + [im.ravel()]),
columns=["mode", "x", "y", "z", "val"])
loop = pd.DataFrame([index + (x,) for index, x in np.ndenumerate(im)], columns=["mode", "x", "y", "z", "val"])
(loop.values == mesh.values).all()
# True
%timeit mesh = pd.DataFrame(np.column_stack(list(map(np.ravel, np.meshgrid(*map(np.arange, im.shape), indexing="ij"))) + [im.ravel()]), columns=["mode", "x", "y", "z", "val"])
# 1 loop, best of 3: 2.07 s per loop
%timeit loop = pd.DataFrame([index + (x,) for index, x in np.ndenumerate(im)], columns=["mode", "x", "y", "z", "val"])
# 1 loop, best of 3: 1min 2s per loop
Run Code Online (Sandbox Code Playgroud)