相关矩阵的平均值 - pandas data fram

use*_*092 4 python pandas

我在pandas python DataFrame中有一个大的相关矩阵:df(342,342).

如何获取上三角形中所有数字的均值,sd等,不包括沿对角线的1?

谢谢.

Zel*_*ny7 5

另一个潜在的答案是:

In [1]: corr
Out[1]:
          a         b         c         d         e
a  1.000000  0.022246  0.018614  0.022592  0.008520
b  0.022246  1.000000  0.033029  0.049714 -0.008243
c  0.018614  0.033029  1.000000 -0.016244  0.049010
d  0.022592  0.049714 -0.016244  1.000000 -0.015428
e  0.008520 -0.008243  0.049010 -0.015428  1.000000

In [2]: corr.values[np.triu_indices_from(corr.values,1)].mean()
Out[2]: 0.016381
Run Code Online (Sandbox Code Playgroud)

编辑:添加的效果指标

我的解决方案的表现:

In [3]: %timeit corr.values[np.triu_indices_from(corr.values,1)].mean()
10000 loops, best of 3: 48.1 us per loop
Run Code Online (Sandbox Code Playgroud)

Theodros Zelleke的单线解决方案的性能:

In [4]: %timeit corr.unstack().ix[zip(*np.triu_indices_from(corr, 1))].mean()
1000 loops, best of 3: 823 us per loop
Run Code Online (Sandbox Code Playgroud)

DSM解决方案的性能:

In [5]: def method1(df):
   ...:     df2 = df.copy()
   ...:     df2.values[np.tril_indices_from(df2)] = np.nan
   ...:     return df2.unstack().mean()
   ...:

In [5]: %timeit method1(corr)
1000 loops, best of 3: 242 us per loop
Run Code Online (Sandbox Code Playgroud)