Log*_*ald 4 r matrix correlation
关于标记为"Y"或"N"的不同城市,我有大约20个变量,它们是因素.变量就像"有合作社"等.我想找到一些相关性,并可能使用corrplot包来显示所有这些变量之间的连接.但由于某些原因,我不能强迫变量,以便以某种方式阅读corrplot或甚至cor()喜欢它们,以便我可以将它们放在矩阵中.我试过了:
M <- cor(model.matrix(~.-1,data=mydata[c(25:44)]))
Run Code Online (Sandbox Code Playgroud)
但在corrplot的结果出来真的很奇怪.有没有人有快速的方法将一堆Y/N答案变成相关矩阵?谢谢!
您可以使用sjPlot-package中的sjp.corr函数或sjt.corr函数进行图形或表格输出.
DF <- data.frame(v1 = sample(c("Y","N"), 100, T),
v2 = sample(c("Y","N"), 100, T),
v3 = sample(c("Y","N"), 100, T),
v4 = sample(c("Y","N"), 100, T),
v5 = sample(c("Y","N"), 100, T))
DF[] <- lapply(DF,as.integer)
library(sjPlot)
sjp.corr(DF)
sjt.corr(DF)
Run Code Online (Sandbox Code Playgroud)
剧情:

该表(在RStudio查看器窗格中):

您可以使用许多参数来修改绘图或表格的外观,请参阅此处的一些示例.
For binary variables, you might consider cross tabs (the table function in R).
However, getting the correlation matrix is pretty straightforward:
# example data
set.seed(1)
DF <- data.frame(x=sample(c("Y","N"),100,T),y=sample(c("Y","N"),100,T))
# how to get correlation
DF[] <- lapply(DF,as.integer)
cor(DF)
# x y
# x 1.0000000 -0.0369479
# y -0.0369479 1.0000000
# visualize it
library(corrplot)
corrplot(cor(DF))
Run Code Online (Sandbox Code Playgroud)
When you convert to integer in this example, "N" is 1 and "Y" is 2. I'm not sure if that holds generally (for R's storage of factors). To have a look at the mapping for your data, try lapply(DF,levels) before converting to integer.
To me, the plot makes sense. If you have questions about the statistical interpretation of correlations in this context, you should consider having a look at http://stats.stackexchange.com