我正在尝试使用长格式的列并将它们扩展为宽格式,如下所示.我想用tidyr用我正在投资的数据处理工具来解决这个问题,但为了使这个答案更加通用,请提供其他解决方案.
这就是我所拥有的:
library(dplyr); library(tidyr)
set.seed(10)
dat <- data_frame(
Person = rep(c("greg", "sally", "sue"), each=2),
Time = rep(c("Pre", "Post"), 3),
Score1 = round(rnorm(6, mean = 80, sd=4), 0),
Score2 = round(jitter(Score1, 15), 0),
Score3 = 5 + (Score1 + Score2)/2
)
## Person Time Score1 Score2 Score3
## 1 greg Pre 80 78 84.0
## 2 greg Post 79 80 84.5
## 3 sally Pre 75 74 79.5
## 4 sally Post 78 78 83.0
## 5 sue Pre 81 78 84.5
## 6 sue Post 82 81 86.5
Run Code Online (Sandbox Code Playgroud)
所需的宽幅格式:
Person Pre.Score1 Pre.Score2 Pre.Score3 Post.Score1 Post.Score2 Post.Score3
1 greg 80 78 84.0 79 80 84.5
2 sally 75 74 79.5 78 78 83.0
3 sue 81 78 84.5 82 81 86.5
Run Code Online (Sandbox Code Playgroud)
我可以通过为每个分数做这样的事情来做到这一点:
spread(dat %>% select(Person, Time, Score1), Time, Score1) %>%
rename(Score1_Pre = Pre, Score1_Post = Post)
Run Code Online (Sandbox Code Playgroud)
然后使用_join
但看起来很冗长,并且必须有更好的方法.
kon*_*vas 81
如果你想坚持下去 tidyr/dplyr
dat %>%
gather(temp, score, starts_with("Score")) %>%
unite(temp1, Time, temp, sep = ".") %>%
spread(temp1, score)
Run Code Online (Sandbox Code Playgroud)
Bro*_*ieG 23
使用reshape2
:
library(reshape2)
dcast(melt(dat), Person ~ Time + variable)
Run Code Online (Sandbox Code Playgroud)
生产:
Using Person, Time as id variables
Person Post_Score1 Post_Score2 Post_Score3 Pre_Score1 Pre_Score2 Pre_Score3
1 greg 79 78 83.5 83 81 87.0
2 sally 82 81 86.5 75 74 79.5
3 sue 78 78 83.0 82 79 85.5
Run Code Online (Sandbox Code Playgroud)
akr*_*run 21
dcast
从data.table
包中使用.
library(data.table)#v1.9.5+
dcast(setDT(dat), Person~Time, value.var=paste0("Score", 1:3))
# Person Score1_Post Score1_Pre Score2_Post Score2_Pre Score3_Post Score3_Pre
#1: greg 79 80 80 78 84.5 84.0
#2: sally 78 75 78 74 83.0 79.5
#3: sue 82 81 81 78 86.5 84.5
Run Code Online (Sandbox Code Playgroud)
或reshape
来自baseR
reshape(as.data.frame(dat), idvar='Person', timevar='Time',direction='wide')
Run Code Online (Sandbox Code Playgroud)