Dir*_*way 17 r reshape dataframe reshape2 tidyr
我有一个很长的数据集我想扩大,我很好奇是否有一种方法可以使用R中的reshape2或tidyr包一步完成所有这些操作.
数据框df
如下所示:
id type transactions amount
20 income 20 100
20 expense 25 95
30 income 50 300
30 expense 45 250
Run Code Online (Sandbox Code Playgroud)
我想谈谈这个问题:
id income_transactions expense_transactions income_amount expense_amount
20 20 25 100 95
30 50 45 300 250
Run Code Online (Sandbox Code Playgroud)
我知道我可以通过reshape2来获得部分路径,例如:
dcast(df, id ~ type, value.var="transactions")
Run Code Online (Sandbox Code Playgroud)
但有没有办法一次性重塑整个df,同时解决"交易"和"金额"变量?理想情况下,新的更合适的列名称?
A5C*_*2T1 28
在"reshape2"中,您可以使用recast
(虽然根据我的经验,这不是一个广为人知的功能).
library(reshape2)
recast(mydf, id ~ variable + type, id.var = c("id", "type"))
# id transactions_expense transactions_income amount_expense amount_income
# 1 20 25 20 95 100
# 2 30 45 50 250 300
Run Code Online (Sandbox Code Playgroud)
您也可以使用基数R reshape
:
reshape(mydf, direction = "wide", idvar = "id", timevar = "type")
# id transactions.income amount.income transactions.expense amount.expense
# 1 20 20 100 25 95
# 3 30 50 300 45 250
Run Code Online (Sandbox Code Playgroud)
或者,你可以melt
和dcast
,像这样的(这里"data.table"):
library(data.table)
library(reshape2)
dcast.data.table(melt(as.data.table(mydf), id.vars = c("id", "type")),
id ~ variable + type, value.var = "value")
# id transactions_expense transactions_income amount_expense amount_income
# 1: 20 25 20 95 100
# 2: 30 45 50 250 300
Run Code Online (Sandbox Code Playgroud)
在dcast.data.table
"data.table"(1.9.8)的更高版本中,您将能够直接执行此操作.如果我理解正确的话,那么@Arun试图实现的是在不首先melt
处理数据的情况下进行重新整形,这就是目前的情况recast
,这实际上是melt
+ dcast
操作序列的包装器.
而且,为了彻底,这是tidyr
方法:
library(dplyr)
library(tidyr)
mydf %>%
gather(var, val, transactions:amount) %>%
unite(var2, type, var) %>%
spread(var2, val)
# id expense_amount expense_transactions income_amount income_transactions
# 1 20 95 25 100 20
# 2 30 250 45 300 50
Run Code Online (Sandbox Code Playgroud)
使用data.table v1.9.6 +,我们可以value.var
同时转换多个列(并且还使用多个聚合函数fun.aggregate
).请参阅?dcast
更多内容以及示例部分.
require(data.table) # v1.9.6+
dcast(dt, id ~ type, value.var=names(dt)[3:4])
# id transactions_expense transactions_income amount_expense amount_income
# 1: 20 25 20 95 100
# 2: 30 45 50 250 300
Run Code Online (Sandbox Code Playgroud)