我刚刚发布了一个问题,询问如何将数据从长表重新整形为宽表.然后我发现这spread()是一个非常方便的功能.所以现在我需要在我之前的帖子上进一步开发.
我们假设我们有一个这样的表:
id1 | id2 | info | action_time | action_comment |
1 | a | info1 | time1 | comment1 |
1 | a | info1 | time2 | comment2 |
1 | a | info1 | time3 | comment3 |
2 | b | info2 | time4 | comment4 |
2 | b | info2 | time5 | comment5 |
Run Code Online (Sandbox Code Playgroud)
我想把它改成这样的东西:
id1 | id2 | info |action_time 1|action_comment1 |action_time 2|action_comment2 |action_time 3|action_comment3 |
1 | a | info1 | time1 | comment1 | time2 | comment2 | time3 | comment3 |
2 | b | info2 | time4 | comment4 | time5 | comment5 | | |
Run Code Online (Sandbox Code Playgroud)
所以这个问题和我之前的问题之间的区别是我添加了另一个专栏,我也需要重新整形.
我正在考虑使用
library(dplyr)
library(tidyr)
df %>%
group_by(id1) %>%
mutate(action_no = paste("action_time", row_number())) %>%
spread(action_no, value = c(action_time, action_comment))
Run Code Online (Sandbox Code Playgroud)
但是当我在value参数中添加两个值时,它给出了一条错误消息:无效的列规范.
我非常喜欢使用这样的%>%运算符来操作数据的想法,所以我很想知道如何纠正我的代码来实现这一点.
非常感谢您的帮助
我们可以使用devel版本执行此操作,该版本data.table可以使用多value.var列.安装devel版本的说明是here
我们将'data.frame'转换为'data.table'(setDT(df)),使用分组变量('id1','id2','info')和dcast'long' 创建一个序列变量('ind')'wide'格式,指定value.var为'action_time'和'action_comment'.
library(data.table)#v1.9.5+
setDT(df)[, ind:= 1:.N, .(id1, id2, info)]
dcast(df, id1 + id2 + info ~ ind,
value.var=c('action_time', 'action_comment'), fill='')
# id1 id2 info 1_action_time 2_action_time 3_action_time 1_action_comment
#1: 1 a info1 time1 time2 time3 comment1
#2: 2 b info2 time4 time5 comment4
# 2_action_comment 3_action_comment
#1: comment2 comment3
#2: comment5
Run Code Online (Sandbox Code Playgroud)
或者使用reshape从base R.我们使用和创建序列变量('ind')ave并reshape从'long'格式更改为'wide'格式.
df$ind <- with(df, ave(seq_along(id1), id1, id2, info, FUN=seq_along))
reshape(df, idvar=c('id1', 'id2', 'info'),timevar='ind', direction='wide')
# id1 id2 info action_time.1 action_comment.1 action_time.2 action_comment.2
#1 1 a info1 time1 comment1 time2 comment2
#4 2 b info2 time4 comment4 time5 comment5
# action_time.3 action_comment.3
#1 time3 comment3
#4 <NA> <NA>
Run Code Online (Sandbox Code Playgroud)
df <- structure(list(id1 = c(1L, 1L, 1L, 2L, 2L), id2 = c("a", "a",
"a", "b", "b"), info = c("info1", "info1", "info1", "info2",
"info2"), action_time = c("time1", "time2", "time3", "time4",
"time5"), action_comment = c("comment1", "comment2", "comment3",
"comment4", "comment5")), .Names = c("id1", "id2", "info", "action_time",
"action_comment"), class = "data.frame", row.names = c(NA, -5L))
Run Code Online (Sandbox Code Playgroud)
尝试:
library(dplyr)
library(tidyr)
df %>%
group_by(id1) %>%
mutate(id = row_number()) %>%
gather(key, value, -(id1:info), -id) %>%
unite(id_key, id, key) %>%
spread(id_key, value)
Run Code Online (Sandbox Code Playgroud)
这使:
#Source: local data frame [2 x 9]
# id1 id2 info 1_action_comment 1_action_time 2_action_comment 2_action_time 3_action_comment 3_action_time
#1 1 a info1 comment1 time1 comment2 time2 comment3 time3
#2 2 b info2 comment4 time4 comment5 time5 NA NA
Run Code Online (Sandbox Code Playgroud)