26 r
我使用以下方法计算我的数据集的相关性:
cor( var1, var2, method = "method")
Run Code Online (Sandbox Code Playgroud)
但我喜欢创建4个不同变量的相关矩阵.最简单的方法是什么?
dar*_*zig 35
cor
在数据框上使用相同的函数(),例如:
> cor(VADeaths)
Rural Male Rural Female Urban Male Urban Female
Rural Male 1.0000000 0.9979869 0.9841907 0.9934646
Rural Female 0.9979869 1.0000000 0.9739053 0.9867310
Urban Male 0.9841907 0.9739053 1.0000000 0.9918262
Urban Female 0.9934646 0.9867310 0.9918262 1.0000000
Run Code Online (Sandbox Code Playgroud)
或者,在同时保存离散变量的数据框(有时也称为因子)上,尝试以下内容:
> cor(mtcars[,unlist(lapply(mtcars, is.numeric))])
mpg cyl disp hp drat wt qsec vs am gear carb
mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.68117191 -0.8676594 0.41868403 0.6640389 0.59983243 0.4802848 -0.55092507
cyl -0.8521620 1.0000000 0.9020329 0.8324475 -0.69993811 0.7824958 -0.59124207 -0.8108118 -0.52260705 -0.4926866 0.52698829
disp -0.8475514 0.9020329 1.0000000 0.7909486 -0.71021393 0.8879799 -0.43369788 -0.7104159 -0.59122704 -0.5555692 0.39497686
hp -0.7761684 0.8324475 0.7909486 1.0000000 -0.44875912 0.6587479 -0.70822339 -0.7230967 -0.24320426 -0.1257043 0.74981247
drat 0.6811719 -0.6999381 -0.7102139 -0.4487591 1.00000000 -0.7124406 0.09120476 0.4402785 0.71271113 0.6996101 -0.09078980
wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.71244065 1.0000000 -0.17471588 -0.5549157 -0.69249526 -0.5832870 0.42760594
qsec 0.4186840 -0.5912421 -0.4336979 -0.7082234 0.09120476 -0.1747159 1.00000000 0.7445354 -0.22986086 -0.2126822 -0.65624923
vs 0.6640389 -0.8108118 -0.7104159 -0.7230967 0.44027846 -0.5549157 0.74453544 1.0000000 0.16834512 0.2060233 -0.56960714
am 0.5998324 -0.5226070 -0.5912270 -0.2432043 0.71271113 -0.6924953 -0.22986086 0.1683451 1.00000000 0.7940588 0.05753435
gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 0.69961013 -0.5832870 -0.21268223 0.2060233 0.79405876 1.0000000 0.27407284
carb -0.5509251 0.5269883 0.3949769 0.7498125 -0.09078980 0.4276059 -0.65624923 -0.5696071 0.05753435 0.2740728 1.00000000
Run Code Online (Sandbox Code Playgroud)
dor*_*ien 17
如果您想将矩阵与一些可视化相结合,我可以推荐(我使用的是内置iris
数据集):
library(psych)
pairs.panels(iris[1:4]) # select columns 1-4
Run Code Online (Sandbox Code Playgroud)
性能分析基本上相同,但默认情况下包含显着性指标.
library(PerformanceAnalytics)
chart.Correlation(iris[1:4])
Run Code Online (Sandbox Code Playgroud)
或者这个简单易用的可视化:
library(corrplot)
x <- cor(iris[1:4])
corrplot(x, type="upper", order="hclust")
Run Code Online (Sandbox Code Playgroud)
查看包中的corr.test
功能psych
:
> corr.test(mtcars[1:4])
Call:corr.test(x = mtcars[1:4])
Correlation matrix
mpg cyl disp hp
mpg 1.00 -0.85 -0.85 -0.78
cyl -0.85 1.00 0.90 0.83
disp -0.85 0.90 1.00 0.79
hp -0.78 0.83 0.79 1.00
Sample Size
mpg cyl disp hp
mpg 32 32 32 32
cyl 32 32 32 32
disp 32 32 32 32
hp 32 32 32 32
Probability value
mpg cyl disp hp
mpg 0 0 0 0
cyl 0 0 0 0
disp 0 0 0 0
hp 0 0 0 0
Run Code Online (Sandbox Code Playgroud)
还有另一个无耻的自我广告:https://gist.github.com/887249