我想用热图表示相关矩阵.R中有一个叫做correlogram的东西,但我不认为Python中有这样的东西.
我怎样才能做到这一点?值从-1到1,例如:
[[ 1. 0.00279981 0.95173379 0.02486161 -0.00324926 -0.00432099]
[ 0.00279981 1. 0.17728303 0.64425774 0.30735071 0.37379443]
[ 0.95173379 0.17728303 1. 0.27072266 0.02549031 0.03324756]
[ 0.02486161 0.64425774 0.27072266 1. 0.18336236 0.18913512]
[-0.00324926 0.30735071 0.02549031 0.18336236 1. 0.77678274]
[-0.00432099 0.37379443 0.03324756 0.18913512 0.77678274 1. ]]
Run Code Online (Sandbox Code Playgroud)
我能够根据另一个问题生成以下热图,但问题是我的值被'切'为0,所以我希望有一个从蓝色(-1)到红色(1)的地图,或者类似的东西,但这里低于0的值没有以适当的方式呈现.
这是代码:
plt.imshow(correlation_matrix,cmap='hot',interpolation='nearest')
Run Code Online (Sandbox Code Playgroud) 我正在努力按照pabas中的数据帧按照seaborn的热图(或matplotlib)的正确格式来制作热图.
我当前的数据框(称为data_yule)是:
Unnamed: 0 SymmetricDivision test MutProb value
3 3 1.0 sackin_yule 0.100 -4.180864
8 8 1.0 sackin_yule 0.050 -9.175349
13 13 1.0 sackin_yule 0.010 -11.408114
18 18 1.0 sackin_yule 0.005 -10.502450
23 23 1.0 sackin_yule 0.001 -8.027475
28 28 0.8 sackin_yule 0.100 -0.722602
33 33 0.8 sackin_yule 0.050 -6.996394
38 38 0.8 sackin_yule 0.010 -10.536340
43 43 0.8 sackin_yule 0.005 -9.544065
48 48 0.8 sackin_yule 0.001 -7.196407
53 53 0.6 sackin_yule 0.100 -0.392256
58 58 0.6 sackin_yule 0.050 -6.621639 …Run Code Online (Sandbox Code Playgroud) 我有一个相关矩阵计算如下pyspark 2.2:
from pyspark.ml.linalg import Vectors
from pyspark.ml.stat import Correlation
from pyspark.ml.linalg import Vectors
from pyspark.ml.feature import VectorAssembler
datos = sql("""select * from proceso_riesgos.jdgc_bd_train_mn_ingresos""")
Variables_corr= ['ingreso_final_mix','ingreso_final_promedio',
'ingreso_final_mediana','ingreso_final_trimedia','ingresos_serv_q1',
'ingresos_serv_q2','ingresos_serv_q3','prom_ingresos_serv','y_correc']
assembler = VectorAssembler(
inputCols=Variables_corr,
outputCol="features")
datos1=datos.select(Variables_corr).filter("y_correc is not null")
output = assembler.transform(datos)
r1 = Correlation.corr(output, "features")
Run Code Online (Sandbox Code Playgroud)
结果是一个带有变量的数据框,称为"pearson(features):matrix":
Row(pearson(features)=DenseMatrix(20, 20, [1.0, 0.9428, 0.8908, 0.913,
0.567, 0.5832, 0.6148, 0.6488, ..., -0.589, -0.6145, -0.5906, -0.5534,
-0.5346, -0.0797, -0.617, 1.0], False))]
Run Code Online (Sandbox Code Playgroud)
我需要获取这些值并将其导出到excel,或者能够操纵结果.列表可能是令人沮丧的.
感谢帮助!!
除了以下使用 Scala 的示例外,我没有找到任何 pyspark 代码来将矩阵转换为火花数据帧。有谁知道如何改用python?