小编Chi*_*Lin的帖子

Python Scipy spearman相关矩阵不匹配双数组相关也不匹配pandas.Data.Frame.corr()

我正在计算矩阵的spearman相关性.我发现矩阵输入和双数组输入在使用时给出了不同的结果scipy.stats.spearmanr.结果也不同于pandas.Data.Frame.corr.

from scipy.stats import spearmanr # scipy 1.0.1
import pandas as pd # 0.22.0
import numpy as np
#Data 
X = pd.DataFrame({"A":[-0.4,1,12,78,84,26,0,0], "B":[-0.4,3.3,54,87,25,np.nan,0,1.2], "C":[np.nan,56,78,0,np.nan,143,11,np.nan], "D":[0,-9.3,23,72,np.nan,-2,-0.3,-0.4], "E":[78,np.nan,np.nan,0,-1,-11,1,323]})
matrix_rho_scipy = spearmanr(X,nan_policy='omit',axis=0)[0]
matrix_rho_pandas = X.corr('spearman')
print(matrix_rho_scipy == matrix_rho_pandas.values) # All False except diagonal
print(spearmanr(X['A'],X['B'],nan_policy='omit',axis=0)[0]) # 0.8839285714285714 from scipy 1.0.1
print(spearmanr(X['A'],X['B'],nan_policy='omit',axis=0)[0]) # 0.8829187134416477 from scipy 1.1.0
print(matrix_rho_scipy[0,1]) # 0.8263621207201486
print(matrix_rho_pandas.values[0,1]) # 0.8829187134416477
Run Code Online (Sandbox Code Playgroud)

后来我发现熊猫的rho和R的rho一样.

X = data.frame(A=c(-0.4,1,12,78,84,26,0,0), 
  B=c(-0.4,3.3,54,87,25,NaN,0,1.2), C=c(NaN,56,78,0,NaN, 143,11,NaN), 
  D=c(0,-9.3,23,72,NaN,-2,-0.3,-0.4), E=c(78,NaN,NaN,0,-1,-11,1,323)) 
cor.test(X$A,X$B,method='spearman', exact = FALSE, na.action="na.omit") # 0.8829187 
Run Code Online (Sandbox Code Playgroud)

但是,Pandas的corr不能用于大表(例如,这里 …

python scipy correlation python-3.x pandas

5
推荐指数
1
解决办法
551
查看次数

标签 统计

correlation ×1

pandas ×1

python ×1

python-3.x ×1

scipy ×1