tcv*_*992 5 pivot r reshape2 dplyr tidyverse
我正在尝试使用 dplyr::pivot_longer 转换为更长的格式,但似乎无法让它执行我想要的操作。我可以使用 reshape::melt 进行管理,但我也希望能够使用pivot_longer 实现相同的目的。
我尝试重新格式化的数据是 mtcars 数据集的相关矩阵:
# Load packages
library(reshape2)
library(dplyr)
# Get the correlation matrix
mydata <- mtcars[, c(1,3,4,5,6,7)]
cormat <- round(cor(mydata),2)
head(cormat)
mpg disp hp drat wt qsec
mpg 1.00 -0.85 -0.78 0.68 -0.87 0.42
disp -0.85 1.00 0.79 -0.71 0.89 -0.43
hp -0.78 0.79 1.00 -0.45 0.66 -0.71
drat 0.68 -0.71 -0.45 1.00 -0.71 0.09
wt -0.87 0.89 0.66 -0.71 1.00 -0.17
qsec 0.42 -0.43 -0.71 0.09 -0.17 1.00
Run Code Online (Sandbox Code Playgroud)
然后,我只想过滤掉矩阵的上三角形;
#Get upper triangle of the correlation matrix
cormat[upper.tri(cormat)] <- NA #OR upper.tri function
Run Code Online (Sandbox Code Playgroud)
然后将其重塑为长格式:
# Reshape into a long format
melted_cormat <-
cormat %>%
melt(na.rm=TRUE)
head(melted_cormat)
Var1 Var2 value value_2
1 mpg mpg 1.00 1
7 mpg disp -0.85 -0.85
8 disp disp 1.00 1
13 mpg hp -0.78 -0.78
14 disp hp 0.79 0.79
15 hp hp 1.00 1
Run Code Online (Sandbox Code Playgroud)
最后,我正在制作的数字是:
ggplot(data = melted_cormat, aes(Var2, Var1, fill = value))+
geom_tile(color="white") +
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1,1),
#space = "Lab",
name="Spearman\nCorrelation") +
theme_minimal()+
coord_fixed() +
geom_text(aes(Var2, Var1, label = value), color = "black", size = 4) +
theme(
axis.text.x=element_text(family="Calibri", face="plain", color="black", size=12, angle=0),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
panel.grid.major=element_blank(),
panel.border=element_blank(),
panel.background=element_blank(),
axis.ticks = element_blank(),
legend.justification = c(1, 0),
legend.position = c(0.9, 0.3),
legend.direction = "horizontal")+
guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
title.position = "top", title.hjust = 0.5))
Run Code Online (Sandbox Code Playgroud)
我似乎无法找到一种使用pivot_longer代替reshape的方法,以便它仍然可以正确地生成图形。以下内容几乎可以工作(感谢@geoff),数据集似乎是正确的,但该图不正确:
melted_cormat <-
cormat %>%
as_tibble() %>%
mutate(Var1 = colnames(cormat)) %>%
pivot_longer(names_to = "Var2", values_to = "value", mpg:qsec, values_drop_na=TRUE)
Run Code Online (Sandbox Code Playgroud)
这是否达到了您需要的行为?
cormat |>
as_tibble() |>
mutate(Var1 = rownames(cormat)) |>
pivot_longer(names_to = "Var2", values_to = "val", mpg:qsec)
Run Code Online (Sandbox Code Playgroud)
输出:
# A tibble: 36 x 3
Var1 Var2 val
<chr> <chr> <dbl>
1 mpg mpg 1
2 mpg disp -0.85
3 mpg hp -0.78
4 mpg drat 0.68
5 mpg wt -0.87
6 mpg qsec 0.42
7 disp mpg -0.85
8 disp disp 1
9 disp hp 0.79
10 disp drat -0.71
# ... with 26 more rows
Run Code Online (Sandbox Code Playgroud)
str(melted_cormat)与之比较str(pivoted_cormat)。你会发现旧的将 sreshape2::melt()转换string为factors 而tidyr::pivot_longer()将它们保留为strings。
这样做的结果是,在melted 版本中,ggplot()将根据因子级别对行和列进行排序,从而保留 中的原始顺序cormat,但在第二种情况下,它们只是普通strings,它们只是按字母顺序排列。
要解决此问题,只需mutate() Var1将中列的原始顺序用作级别Var2即可。这将为您提供您想要的情节。factorcormat
观察下面示例的最后两行的差异,并注意默认值method是cor,"pearson"因此在使用相关方法标记图例时要小心。
# Load packages
library(tidyverse)
library(reshape2)
# define plotting function
plot_fun <- function(dat) {
ggplot(data = dat, aes(Var2, Var1, fill = value)) +
geom_tile(color = "white") +
scale_fill_gradient2(
low = "blue",
high = "red",
mid = "white",
midpoint = 0,
limit = c(-1, 1),
#space = "Lab",
name = "Spearman\nCorrelation"
) +
theme_minimal() +
coord_fixed() +
geom_text(aes(Var2, Var1, label = value),
color = "black",
size = 4) +
theme(
axis.text.x = element_text(
family = "Calibri",
face = "plain",
color = "black",
size = 12,
angle = 0
),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.major = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.ticks = element_blank(),
legend.justification = c(1, 0),
legend.position = c(0.9, 0.3),
legend.direction = "horizontal"
) +
guides(fill = guide_colorbar(
barwidth = 7,
barheight = 1,
title.position = "top",
title.hjust = 0.5
))
}
# Get the correlation matrix
cormat <- mtcars[, c(1, 3, 4, 5, 6, 7)] %>%
cor(., method = "spearman") %>% # note selection of correlation method
round(2) %>%
replace(upper.tri(.), NA)
# make melted version
melted <- cormat %>%
melt(na.rm = TRUE)
# make pivoted version
pivoted <-
cormat %>%
as.data.frame() %>%
rownames_to_column("Var1") %>%
pivot_longer(
-Var1,
names_to = "Var2",
values_to = "value",
values_drop_na = TRUE
)
# note column types on melted vs pivoted
str(melted)
#> 'data.frame': 21 obs. of 3 variables:
#> $ Var1 : Factor w/ 6 levels "mpg","disp","hp",..: 1 2 3 4 5 6 2 3 4 5 ...
#> $ Var2 : Factor w/ 6 levels "mpg","disp","hp",..: 1 1 1 1 1 1 2 2 2 2 ...
#> $ value: num 1 -0.91 -0.89 0.65 -0.89 0.47 1 0.85 -0.68 0.9 ...
str(pivoted)
#> tibble [21 x 3] (S3: tbl_df/tbl/data.frame)
#> $ Var1 : chr [1:21] "mpg" "disp" "disp" "hp" ...
#> $ Var2 : chr [1:21] "mpg" "mpg" "disp" "mpg" ...
#> $ value: num [1:21] 1 -0.91 1 -0.89 0.85 1 0.65 -0.68 -0.52 1 ...
# melted version gives desired plot
melted %>%
plot_fun()
Run Code Online (Sandbox Code Playgroud)

# pivoted version orders variables in alphabetical order
pivoted %>%
plot_fun()
Run Code Online (Sandbox Code Playgroud)

# turning the variable names into a factor fixes the plot
pivoted %>%
mutate(across(starts_with("Var"), ~factor(.x, levels = colnames(cormat)))) %>%
plot_fun()
Run Code Online (Sandbox Code Playgroud)

由reprex 包于 2022 年 1 月 12 日创建(v2.0.1)