相关疑难解决方法(0)

如何将多个变量的重复测量扩展为宽格式?

我正在尝试使用长格式的列并将它们扩展为宽格式,如下所示.我想用tidyr用我正在投资的数据处理工具来解决这个问题,但为了使这个答案更加通用,请提供其他解决方案.

这就是我所拥有的:

library(dplyr); library(tidyr)

set.seed(10)
dat <- data_frame(
    Person = rep(c("greg", "sally", "sue"), each=2),
    Time = rep(c("Pre", "Post"), 3),
    Score1 = round(rnorm(6, mean = 80, sd=4), 0),
    Score2 = round(jitter(Score1, 15), 0),
    Score3 = 5 + (Score1 + Score2)/2
)

##   Person Time Score1 Score2 Score3
## 1   greg  Pre     80     78   84.0
## 2   greg Post     79     80   84.5
## 3  sally  Pre     75     74   79.5
## 4  sally Post     78     78   83.0
## 5    sue  Pre     81     78 …
Run Code Online (Sandbox Code Playgroud)

r tidyr

63
推荐指数
3
解决办法
1万
查看次数

将多个值列重新调整为宽格式

我有以下数据框,我想使用强制转换来创建一个"数据透视表",其中包含两个值(值和百分比)的列.这是数据框:

expensesByMonth <- structure(list(month = c("2012-02-01", "2012-02-01", "2012-02-01", 
"2012-02-01", "2012-02-01", "2012-02-01", "2012-02-01", "2012-02-01", 
"2012-02-01", "2012-02-01", "2012-02-01", "2012-02-01", "2012-03-01", 
"2012-03-01", "2012-03-01", "2012-03-01", "2012-03-01", "2012-03-01", 
"2012-03-01", "2012-03-01", "2012-03-01", "2012-03-01", "2012-03-01", 
"2012-03-01", "2012-03-01", "2012-03-01", "2012-03-01", "2012-04-01", 
"2012-04-01", "2012-04-01", "2012-04-01", "2012-04-01", "2012-04-01", 
"2012-04-01", "2012-04-01", "2012-04-01", "2012-04-01", "2012-04-01", 
"2012-04-01", "2012-04-01", "2012-04-01", "2012-04-01", "2012-04-01", 
"2012-04-01", "2012-04-01", "2012-05-01", "2012-05-01", "2012-05-01", 
"2012-05-01", "2012-05-01", "2012-05-01", "2012-05-01", "2012-05-01", 
"2012-05-01", "2012-05-01", "2012-05-01", "2012-05-01", "2012-05-01", 
"2012-05-01", "2012-05-01", "2012-05-01", "2012-05-01", "2012-05-01", 
"2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01", 
"2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01", 
"2012-06-01", "2012-06-01", …
Run Code Online (Sandbox Code Playgroud)

r reshape r-faq

22
推荐指数
4
解决办法
3万
查看次数

使用spread来创建带有tidyr的两个值列

我有一个看起来像这样的数据框(见链接).我想获取下面产生的输出,并通过在n和平均变量上扩展色调变量更进一步.似乎这个主题可能会对此产生影响,但我无法使其工作: 是否可以在tidyr中的多个列上使用传播类似于dcast?

我希望最终表将源变量放在一列中,然后将tone-n和tone-avg变量放在列中.所以我希望列标题为"source" - "For - n" - "Against - n""For -Avg" - "Against - Avg".这是出版物,不是为了进一步计算,所以它是关于呈现数据.以这种方式呈现数据对我来说似乎更直观.谢谢.

#variable1
Politician.For<-sample(seq(0,4,1),50, replace=TRUE)
#variable2
Politician.Against<-sample(seq(0,4,1),50, replace=TRUE)
#Variable3
Activist.For<-sample(seq(0,4,1),50,replace=TRUE)
#variable4
Activist.Against<-sample(seq(0,4,1),50,replace=TRUE)
#dataframe
df<-data.frame(Politician.For, Politician.Against, Activist.For,Activist.Against)

#tidyr
df %>%
 #Gather all columns 
 gather(df) %>%
 #separate by the period character 
 #(default separation character is non-alpha numeric characterr) 
 separate(col=df, into=c('source', 'tone')) %>%
 #group by both source and tone  
 group_by(source,tone) %>%
 #summarise to create counts and average
 summarise(n=sum(value), avg=mean(value)) %>%
 #try to spread
 spread(tone, c('n', 'value'))
Run Code Online (Sandbox Code Playgroud)

r spread tidyr

4
推荐指数
1
解决办法
1797
查看次数

dplyr与子组连接

下面的问题可以看作是"两列重塑到宽",并且有几种方法可以解决它的经典方式,从base::reshape(恐怖)到reshape2.对于两组情况,一个简单的子组连接效果最好.

我可以在管道框架内重新构建连接dplyr吗?下面的例子有点傻,但我需要加入更长的管道链,我不想打破它.

library(dplyr)
d = data.frame(subject= rep(1:5,each=2),treatment=letters[1:2],bp = rnorm(10))

d %>%
  # Assume piped manipulations here
  # Make wide
  # Assume additional piped manipulations here

# Make wide (old style)
with(d,left_join(d[treatment=="a",],
          d[treatment=="b",],by="subject" ))
Run Code Online (Sandbox Code Playgroud)

r reshape2 dplyr magrittr

3
推荐指数
1
解决办法
5031
查看次数

标签 统计

r ×4

tidyr ×2

dplyr ×1

magrittr ×1

r-faq ×1

reshape ×1

reshape2 ×1

spread ×1