是否有可能在tidyr中的多个列上使用扩展类似于dcast？

Question

是否有可能在tidyr中的多个列上使用扩展类似于dcast？

我有以下虚拟数据:

library(dplyr)
library(tidyr)
library(reshape2)
dt <- expand.grid(Year = 1990:2014, Product=LETTERS[1:8], Country = paste0(LETTERS, "I")) %>%   select(Product, Country, Year)
dt$value <- rnorm(nrow(dt))

Run Code Online (Sandbox Code Playgroud)

我选择了两个产品国家组合

sdt <- dt %>% filter((Product == "A" & Country == "AI") | (Product == "B" & Country =="EI"))

Run Code Online (Sandbox Code Playgroud)

我希望每个组合并排看到这些值.我可以这样做dcast:

sdt %>% dcast(Year ~ Product + Country)

Run Code Online (Sandbox Code Playgroud)

是否有可能spread从包tidyr做到这一点？

Answer 1

akr*_*run 59

一种选择是通过加入'Product'和'Country'列来创建一个新的'Prod_Count' paste,删除那些列,select并使用spreadfrom 从'long'重新整形为'wide' tidyr.

 library(dplyr)
 library(tidyr)
 sdt %>%
 mutate(Prod_Count=paste(Product, Country, sep="_")) %>%
 select(-Product, -Country)%>% 
 spread(Prod_Count, value)%>%
 head(2)
 #  Year      A_AI       B_EI
 #1 1990 0.7878674  0.2486044
 #2 1991 0.2343285 -1.1694878

Run Code Online (Sandbox Code Playgroud)

或者我们可以通过使用unitefrom tidyr(来自@ beetroot的评论)避免一些步骤,并像以前一样重塑形状.

 sdt%>% 
 unite(Prod_Count, Product,Country) %>%
 spread(Prod_Count, value)%>% 
 head(2)
 #   Year      A_AI       B_EI
 # 1 1990 0.7878674  0.2486044
 # 2 1991 0.2343285 -1.1694878

Run Code Online (Sandbox Code Playgroud)

这是哈德利批准的解决这个问题的方法;) (27认同)
好吧有`unite()`但它似乎只能使用数字数据(虽然有目的吗？). (9认同)
@hadley对于tidyverse来说这是一个异乎寻常的丑陋解决方案.所有列都必须多次列出,更糟糕的是它们会丢失类型,因此必须将所有列都强制转换为数字. (7认同)
在过去几个月多次咨询过这个帖子后,我发现基于reshape2/dcast的解决方案最为优雅.另请参见http://stackoverflow.com/questions/27418919/dplyr-with-subgroup-join,其中基于扩展的解决方案不能一般化为多个分组列,而是基于重新形成的列. (5认同)
@beetroot,谢谢,是的,似乎工作`sdt%>%unite(Prod_Count,Product,Country)%>%spread(Prod_Count,value)%>%head()` (4认同)

Answer 2

hpl*_*ger 6

使用pivot_wider()tidyr 1.0.0版中引入的新功能时，可以通过一个函数调用来实现。

pivot_wider()（counterpart pivot_longer()：）与相似spread()。但是，它提供了其他功能，例如使用多个键/名称列（和/或多个值列）。为此，自变量（names_from指示从哪个列中获取新变量的名称）可以采用多个列名称（此处Product和Country）。

library("tidyr")

sdt %>% 
    pivot_wider(id_cols = Year,
                names_from = c(Product, Country)) %>% 
    head(2)
#> # A tibble: 2 x 3
#>     Year   A_AI    B_EI
#>    <int>  <dbl>   <dbl>
#>  1  1990 -2.08  -0.113 
#>  2  1991 -1.02  -0.0546

Run Code Online (Sandbox Code Playgroud)

另请参阅：https : //tidyr.tidyverse.org/articles/pivot.html

归档时间：	11 年，6 月前
查看次数：	28262 次
最近记录：	6 年，3 月前