一次重塑多个值

Dir*_*way 17 r reshape dataframe reshape2 tidyr

我有一个很长的数据集我想扩大,我很好奇是否有一种方法可以使用R中的reshape2或tidyr包一步完成所有这些操作.

数据框df如下所示:

id  type    transactions    amount
20  income       20          100
20  expense      25          95
30  income       50          300
30  expense      45          250
Run Code Online (Sandbox Code Playgroud)

我想谈谈这个问题:

id  income_transactions expense_transactions    income_amount   expense_amount
20       20                           25                 100             95
30       50                           45                 300             250
Run Code Online (Sandbox Code Playgroud)

我知道我可以通过reshape2来获得部分路径,例如:

dcast(df, id ~  type, value.var="transactions")
Run Code Online (Sandbox Code Playgroud)

但有没有办法一次性重塑整个df,同时解决"交易"和"金额"变量?理想情况下,新的更合适的列名称?

A5C*_*2T1 28

在"reshape2"中,您可以使用recast(虽然根据我的经验,这不是一个广为人知的功能).

library(reshape2)
recast(mydf, id ~ variable + type, id.var = c("id", "type"))
#   id transactions_expense transactions_income amount_expense amount_income
# 1 20                   25                  20             95           100
# 2 30                   45                  50            250           300
Run Code Online (Sandbox Code Playgroud)

您也可以使用基数R reshape:

reshape(mydf, direction = "wide", idvar = "id", timevar = "type")
#   id transactions.income amount.income transactions.expense amount.expense
# 1 20                  20           100                   25             95
# 3 30                  50           300                   45            250
Run Code Online (Sandbox Code Playgroud)

或者,你可以meltdcast,像这样的(这里"data.table"):

library(data.table)
library(reshape2)
dcast.data.table(melt(as.data.table(mydf), id.vars = c("id", "type")), 
                 id ~ variable + type, value.var = "value")
#    id transactions_expense transactions_income amount_expense amount_income
# 1: 20                   25                  20             95           100
# 2: 30                   45                  50            250           300
Run Code Online (Sandbox Code Playgroud)

dcast.data.table"data.table"(1.9.8)的更高版本中,您将能够直接执行此操作.如果我理解正确的话,那么@Arun试图实现的是在不首先melt处理数据的情况下进行重新整形,这就是目前的情况recast,这实际上是melt+ dcast操作序列的包装器.


而且,为了彻底,这是tidyr方法:

library(dplyr)
library(tidyr)
mydf %>% 
  gather(var, val, transactions:amount) %>% 
  unite(var2, type, var) %>% 
  spread(var2, val)
#   id expense_amount expense_transactions income_amount income_transactions
# 1 20             95                   25           100                  20
# 2 30            250                   45           300                  50
Run Code Online (Sandbox Code Playgroud)

  • `reshape()`很简单?我要说的就是"巴哈哈哈哈" (5认同)
  • @hadley,我不能代表大卫说话,但我没有解释他的评论意味着`reshape()`很简单,而是暗示这里的'reshape()`方法实际上相当简单. (3认同)
  • 在这种情况下,与tidyr,dplyr,data.table,reshape2等"混乱"的整个观点是,它们更好地概括为新问题,而reshape()则没有. (2认同)

Aru*_*run 5

使用data.table v1.9.6 +,我们可以value.var同时转换多个列(并且还使用多个聚合函数fun.aggregate).请参阅?dcast更多内容以及示例部分.

require(data.table) # v1.9.6+
dcast(dt, id ~ type, value.var=names(dt)[3:4])
#    id transactions_expense transactions_income amount_expense amount_income
# 1: 20                   25                  20             95           100
# 2: 30                   45                  50            250           300
Run Code Online (Sandbox Code Playgroud)