在R中的spread()函数中使用put两个值列

Question

在R中的spread()函数中使用put两个值列

我刚刚发布了一个问题,询问如何将数据从长表重新整形为宽表.然后我发现这spread()是一个非常方便的功能.所以现在我需要在我之前的帖子上进一步开发.

我们假设我们有一个这样的表:

id1   |  id2 |  info  | action_time | action_comment  |
 1    | a    |  info1 |    time1    |        comment1 |
 1    | a    |  info1 |    time2    |        comment2 |
 1    | a    |  info1 |    time3    |        comment3 |
 2    | b    |  info2 |    time4    |        comment4 |
 2    | b    |  info2 |    time5    |        comment5 |

Run Code Online (Sandbox Code Playgroud)

我想把它改成这样的东西:

id1   |  id2 |  info  |action_time 1|action_comment1 |action_time 2|action_comment2 |action_time 3|action_comment3  |
 1    | a    |  info1 |    time1    |      comment1  |    time2    |      comment2  |    time3    |      comment3   |
 2    | b    |  info2 |    time4    |      comment4  |    time5    |      comment5  |             |                 |

Run Code Online (Sandbox Code Playgroud)

所以这个问题和我之前的问题之间的区别是我添加了另一个专栏,我也需要重新整形.

我正在考虑使用

library(dplyr)
library(tidyr)

df %>% 
  group_by(id1) %>% 
  mutate(action_no = paste("action_time", row_number())) %>%
  spread(action_no, value = c(action_time, action_comment))

Run Code Online (Sandbox Code Playgroud)

但是当我在value参数中添加两个值时,它给出了一条错误消息:无效的列规范.

我非常喜欢使用这样的%>%运算符来操作数据的想法,所以我很想知道如何纠正我的代码来实现这一点.

非常感谢您的帮助

Answer 1

akr*_*run 8

我们可以使用devel版本执行此操作,该版本data.table可以使用多value.var列.安装devel版本的说明是here

我们将'data.frame'转换为'data.table'(setDT(df)),使用分组变量('id1','id2','info')和dcast'long' 创建一个序列变量('ind')'wide'格式,指定value.var为'action_time'和'action_comment'.

library(data.table)#v1.9.5+
setDT(df)[, ind:= 1:.N, .(id1, id2, info)]
dcast(df, id1 + id2 + info ~ ind,
      value.var=c('action_time', 'action_comment'), fill='')
 #    id1 id2  info 1_action_time 2_action_time 3_action_time 1_action_comment
 #1:   1   a info1         time1         time2         time3         comment1
 #2:   2   b info2         time4         time5                       comment4
 #   2_action_comment 3_action_comment
 #1:         comment2         comment3
 #2:         comment5

Run Code Online (Sandbox Code Playgroud)

或者使用reshape从base R.我们使用和创建序列变量('ind')ave并reshape从'long'格式更改为'wide'格式.

df$ind <- with(df, ave(seq_along(id1), id1, id2, info, FUN=seq_along))
reshape(df, idvar=c('id1', 'id2', 'info'),timevar='ind', direction='wide')
#  id1 id2  info action_time.1 action_comment.1 action_time.2 action_comment.2
#1   1   a info1         time1         comment1         time2         comment2
#4   2   b info2         time4         comment4         time5         comment5
#  action_time.3 action_comment.3
#1         time3         comment3
#4          <NA>             <NA>

Run Code Online (Sandbox Code Playgroud)

数据

df <- structure(list(id1 = c(1L, 1L, 1L, 2L, 2L), id2 = c("a", "a", 
"a", "b", "b"), info = c("info1", "info1", "info1", "info2", 
"info2"), action_time = c("time1", "time2", "time3", "time4", 
"time5"), action_comment = c("comment1", "comment2", "comment3", 
"comment4", "comment5")), .Names = c("id1", "id2", "info", "action_time", 
"action_comment"), class = "data.frame", row.names = c(NA, -5L))

Run Code Online (Sandbox Code Playgroud)

Answer 2

Ste*_*pré 6

尝试:

library(dplyr)
library(tidyr)

df %>%
  group_by(id1) %>%
  mutate(id = row_number()) %>%
  gather(key, value, -(id1:info), -id) %>%
  unite(id_key, id, key) %>%
  spread(id_key, value)

Run Code Online (Sandbox Code Playgroud)

这使:

#Source: local data frame [2 x 9]

#  id1 id2  info 1_action_comment 1_action_time 2_action_comment 2_action_time 3_action_comment 3_action_time
#1   1   a info1         comment1         time1         comment2         time2         comment3         time3
#2   2   b info2         comment4         time4         comment5         time5               NA            NA

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，4 月前
查看次数：	849 次
最近记录：	7 年，3 月前