简单的多维numpy ndarray到pandas数据帧方法?

TNT*_*TNT 3 numpy multidimensional-array multi-index dataframe pandas

有一个 4-D numpy.ndarray,例如

myarr = np.random.rand(10,4,3,2) dims={'time':1:10,'sub':1:4,'cond':['A','B','C'],'measure':['meas1','meas2']}

但可能有更高的维度。如何创建带有多索引的 pandas.dataframe,只需将维度作为索引传递,而无需进一步手动调整(将 ndarray 重塑为 2D 形状)?

我无法理解重塑,甚至还没有真正在3 维中,所以如果可能的话,我正在寻找一种“自动”方法。

传递列/行索引并创建数据框的函数是什么?就像是:

df=nd2df(myarr,dim2row=[0,1],dim2col=[2,3],rowlab=['time','sub'],collab=['cond','measure'])

并且还有类似的东西:

              meas1             meas2
              A     B     C     A    B    C
sub   time
  1      1
         2
         3
         .
         .
  2      1
         2
 ...
Run Code Online (Sandbox Code Playgroud)

如果它不可能/不可行自动执行,那么比多索引手册更简洁的解释是值得赞赏的。

当我不关心维度的顺序时,我什至无法做到正确,例如,我希望这能奏效:

a=np.arange(24).reshape((3,2,2,2))
iterables=[[1,2,3],[1,2],['m1','m2'],['A','B']]
pd.MultiIndex.from_product(iterables, names=['time','sub','meas','cond'])



pd.DataFrame(a.reshape(2*3*1,2*2),index)
Run Code Online (Sandbox Code Playgroud)

给出:

ValueError: Shape of passed values is (4, 6), indices imply (4, 24)
Run Code Online (Sandbox Code Playgroud)

piR*_*red 5

您收到错误是因为您已将 ndarray 重新整形为 6x4 并应用旨在捕获单个系列中所有维度的索引。以下是使宠物示例工作的设置:

a=np.arange(24).reshape((3,2,2,2))
iterables=[[1,2,3],[1,2],['m1','m2'],['A','B']]
index = pd.MultiIndex.from_product(iterables, names=['time','sub','meas','cond'])

pd.DataFrame(a.reshape(24, 1),index=index)
Run Code Online (Sandbox Code Playgroud)

解决方案

这是一个可以完成工作的通用 DataFrame 创建者:

def produce_df(rows, columns, row_names=None, column_names=None):
    """rows is a list of lists that will be used to build a MultiIndex
    columns is a list of lists that will be used to build a MultiIndex"""
    row_index = pd.MultiIndex.from_product(rows, names=row_names)
    col_index = pd.MultiIndex.from_product(columns, names=column_names)
    return pd.DataFrame(index=row_index, columns=col_index)
Run Code Online (Sandbox Code Playgroud)

示范

没有命名的索引级别

produce_df([['a', 'b'], ['c', 'd']], [['1', '2'], ['3', '4']])

       1         2     
       3    4    3    4
a c  NaN  NaN  NaN  NaN
  d  NaN  NaN  NaN  NaN
b c  NaN  NaN  NaN  NaN
  d  NaN  NaN  NaN  NaN
Run Code Online (Sandbox Code Playgroud)

具有命名索引级别

produce_df([['a', 'b'], ['c', 'd']], [['1', '2'], ['3', '4']],
           row_names=['alpha1', 'alpha2'], column_names=['number1', 'number2'])

number1          1         2     
number2          3    4    3    4
alpha1 alpha2                    
a      c       NaN  NaN  NaN  NaN
       d       NaN  NaN  NaN  NaN
b      c       NaN  NaN  NaN  NaN
       d       NaN  NaN  NaN  NaN
Run Code Online (Sandbox Code Playgroud)