使用pivot_longer转换为更长的格式

Question

使用pivot_longer转换为更长的格式

tcv*_*992 5 pivot r reshape2 dplyr tidyverse

我正在尝试使用 dplyr::pivot_longer 转换为更长的格式，但似乎无法让它执行我想要的操作。我可以使用 reshape::melt 进行管理，但我也希望能够使用pivot_longer 实现相同的目的。

我尝试重新格式化的数据是 mtcars 数据集的相关矩阵：

# Load packages
library(reshape2)
library(dplyr)

# Get the correlation matrix
mydata <- mtcars[, c(1,3,4,5,6,7)]
cormat <- round(cor(mydata),2)

head(cormat)
       mpg  disp    hp  drat    wt  qsec
mpg   1.00 -0.85 -0.78  0.68 -0.87  0.42
disp -0.85  1.00  0.79 -0.71  0.89 -0.43
hp   -0.78  0.79  1.00 -0.45  0.66 -0.71
drat  0.68 -0.71 -0.45  1.00 -0.71  0.09
wt   -0.87  0.89  0.66 -0.71  1.00 -0.17
qsec  0.42 -0.43 -0.71  0.09 -0.17  1.00

Run Code Online (Sandbox Code Playgroud)

然后，我只想过滤掉矩阵的上三角形；

#Get upper triangle of the correlation matrix
cormat[upper.tri(cormat)] <- NA #OR upper.tri function

Run Code Online (Sandbox Code Playgroud)

然后将其重塑为长格式：

# Reshape into a long format
melted_cormat <- 
  cormat %>% 
  melt(na.rm=TRUE)

head(melted_cormat)
   Var1 Var2 value value_2
1   mpg  mpg  1.00       1
7   mpg disp -0.85   -0.85
8  disp disp  1.00       1
13  mpg   hp -0.78   -0.78
14 disp   hp  0.79    0.79
15   hp   hp  1.00       1

Run Code Online (Sandbox Code Playgroud)

最后，我正在制作的数字是：

ggplot(data = melted_cormat, aes(Var2, Var1, fill = value))+
  geom_tile(color="white") +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                       midpoint = 0, limit = c(-1,1), 
                       #space = "Lab", 
                       name="Spearman\nCorrelation") +
  theme_minimal()+ 
  coord_fixed() +
  geom_text(aes(Var2, Var1, label = value), color = "black", size = 4) +
  theme(
    axis.text.x=element_text(family="Calibri", face="plain", color="black", size=12, angle=0), 
    axis.title.x=element_blank(),
    axis.title.y=element_blank(),
    panel.grid.major=element_blank(),
    panel.border=element_blank(),
    panel.background=element_blank(),
    axis.ticks = element_blank(),
    legend.justification = c(1, 0),
    legend.position = c(0.9, 0.3),
    legend.direction = "horizontal")+
  guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
                               title.position = "top", title.hjust = 0.5))

Run Code Online (Sandbox Code Playgroud)

我似乎无法找到一种使用pivot_longer代替reshape的方法，以便它仍然可以正确地生成图形。以下内容几乎可以工作（感谢@geoff），数据集似乎是正确的，但该图不正确：

melted_cormat <- 
  cormat %>% 
  as_tibble() %>% 
  mutate(Var1 = colnames(cormat)) %>% 
  pivot_longer(names_to = "Var2", values_to = "value", mpg:qsec, values_drop_na=TRUE)

Run Code Online (Sandbox Code Playgroud)

运行与上面相同的 ggplot 代码给出：

Answer 1

geo*_*off 5

这是否达到了您需要的行为？

cormat |> 
  as_tibble() |> 
  mutate(Var1 = rownames(cormat)) |> 
  pivot_longer(names_to = "Var2", values_to = "val", mpg:qsec)

Run Code Online (Sandbox Code Playgroud)

输出：

# A tibble: 36 x 3
   Var1  Var2    val
   <chr> <chr> <dbl>
 1 mpg   mpg    1   
 2 mpg   disp  -0.85
 3 mpg   hp    -0.78
 4 mpg   drat   0.68
 5 mpg   wt    -0.87
 6 mpg   qsec   0.42
 7 disp  mpg   -0.85
 8 disp  disp   1   
 9 disp  hp     0.79
10 disp  drat  -0.71
# ... with 26 more rows

Run Code Online (Sandbox Code Playgroud)

Answer 2

Dan*_*ams 2

str(melted_cormat)与之比较str(pivoted_cormat)。你会发现旧的将 sreshape2::melt()转换string为factors 而tidyr::pivot_longer()将它们保留为strings。

这样做的结果是，在melted 版本中，ggplot()将根据因子级别对行和列进行排序，从而保留中的原始顺序cormat，但在第二种情况下，它们只是普通strings，它们只是按字母顺序排列。

要解决此问题，只需mutate() Var1将中列的原始顺序用作级别Var2即可。这将为您提供您想要的情节。factorcormat

观察下面示例的最后两行的差异，并注意默认值method是cor，"pearson"因此在使用相关方法标记图例时要小心。

# Load packages
library(tidyverse)
library(reshape2)

# define plotting function
plot_fun <- function(dat) {
  ggplot(data = dat, aes(Var2, Var1, fill = value)) +
    geom_tile(color = "white") +
    scale_fill_gradient2(
      low = "blue",
      high = "red",
      mid = "white",
      midpoint = 0,
      limit = c(-1, 1),
      #space = "Lab",
      name = "Spearman\nCorrelation"
    ) +
    theme_minimal() +
    coord_fixed() +
    geom_text(aes(Var2, Var1, label = value),
              color = "black",
              size = 4) +
    theme(
      axis.text.x = element_text(
        family = "Calibri",
        face = "plain",
        color = "black",
        size = 12,
        angle = 0
      ),
      axis.title.x = element_blank(),
      axis.title.y = element_blank(),
      panel.grid.major = element_blank(),
      panel.border = element_blank(),
      panel.background = element_blank(),
      axis.ticks = element_blank(),
      legend.justification = c(1, 0),
      legend.position = c(0.9, 0.3),
      legend.direction = "horizontal"
    ) +
    guides(fill = guide_colorbar(
      barwidth = 7,
      barheight = 1,
      title.position = "top",
      title.hjust = 0.5
    ))
}

# Get the correlation matrix
cormat <- mtcars[, c(1, 3, 4, 5, 6, 7)] %>%
  cor(., method = "spearman") %>% # note selection of correlation method
  round(2) %>%
  replace(upper.tri(.), NA)

# make melted version
melted <- cormat %>%
  melt(na.rm = TRUE)

# make pivoted version
pivoted <-
  cormat %>%
  as.data.frame() %>%
  rownames_to_column("Var1") %>%
  pivot_longer(
    -Var1,
    names_to = "Var2",
    values_to = "value",
    values_drop_na = TRUE
  )

# note column types on melted vs pivoted
str(melted)
#> 'data.frame':    21 obs. of  3 variables:
#>  $ Var1 : Factor w/ 6 levels "mpg","disp","hp",..: 1 2 3 4 5 6 2 3 4 5 ...
#>  $ Var2 : Factor w/ 6 levels "mpg","disp","hp",..: 1 1 1 1 1 1 2 2 2 2 ...
#>  $ value: num  1 -0.91 -0.89 0.65 -0.89 0.47 1 0.85 -0.68 0.9 ...
str(pivoted)
#> tibble [21 x 3] (S3: tbl_df/tbl/data.frame)
#>  $ Var1 : chr [1:21] "mpg" "disp" "disp" "hp" ...
#>  $ Var2 : chr [1:21] "mpg" "mpg" "disp" "mpg" ...
#>  $ value: num [1:21] 1 -0.91 1 -0.89 0.85 1 0.65 -0.68 -0.52 1 ...

# melted version gives desired plot
melted %>% 
  plot_fun()

Run Code Online (Sandbox Code Playgroud)

# pivoted version orders variables in alphabetical order
pivoted %>% 
  plot_fun()

Run Code Online (Sandbox Code Playgroud)

# turning the variable names into a factor fixes the plot
pivoted %>% 
  mutate(across(starts_with("Var"), ~factor(.x, levels = colnames(cormat)))) %>%
  plot_fun()

Run Code Online (Sandbox Code Playgroud)

^{由reprex 包于 2022 年 1 月 12 日创建(v2.0.1)}

归档时间：	4 年，4 月前
查看次数：	758 次
最近记录：	4 年，4 月前