如何使用dplyr对x中的元素进行分组,对x的间隔计算x的频率？

Question

如何使用dplyr对x中的元素进行分组,对x的间隔计算x的频率？

x < - c('a','v','c','a','d','e','g','f','h','y','u',' R", 'S', 'W', 'S', 'd', 'G', 'J', 'U', 'R', 'S', 'S', 'S', 'v' , 'b', 'G', 'E', 'W', 'S', 'd', 'G', 'H', 'J', 'I', 'T', 'E'," W", 'W', 'q', 'q', 'd', 'v', 'b', 'M', 'M', 'K', 'L', 'U', 'p' , 'O', 'R', 'T', 'N', 'E', 'W', 'W', 'J', 'F', 'C', 'G', 'H'," T", 'R', 'd', 'E', 'W', 'W', 'W', 'Z', 'F', 'G', 'F', 'H', 'H' , 'Y', 'R', 'F', 'F', 'L')

y < - 样本(1:40,79,替换= T)

y 1 38 18 19 19 37 38 26 4 32 23 11 24 36 15 22 19 6 24 13 36 2 26 35 39 8 33 20 19 23 28 5 17 40 26 18 21 [37] 35 23 27 12 3 33 16 32 11 19 4 5 8 19 5 19 33 33 33 13 12 32 21 4 14 8 28 34 33 22 34 19 39 23 6 8 [73] 37 17 21 16 38 15 36

在此输入图像描述

我有两个变量'x'和'y'.'x'中有一个以上的观察实例.y中的值对应于'x'中的每个观察值

我想实现分组以及将y值分区为间隔.

换句话说,一个字母出现的次数将被划分为基于在每个出现时分配给该字母的值指定的间隔.

例如: -

在此输入图像描述

无法正确表示表格,因为我找不到更好的方法在这里输入.

我希望很清楚.如果需要,我会尽力重申.在这方面,我将不胜感激.

Answer 1

akr*_*run 12

运用 dplyr

library(dplyr)
library(tidyr)

res <- tally(group_by(df, x, y=cut(y, breaks=seq(0,40, by=10)))) %>% 
                                                        ungroup() %>%
                                                         spread(y,n, fill=0)

Run Code Online (Sandbox Code Playgroud)

或使用 data.table

library(data.table)
res1 <- dcast.data.table(setDT(df)[,list(.N), 
           by=list(x, y1=cut(y, breaks=seq(0,40, by=10)))],
                            x~y1, value.var="N", fill=0L)

all.equal(as.data.frame(res), as.data.frame(res1))
#[1] TRUE

Run Code Online (Sandbox Code Playgroud)

注意:有一个label在争论cut,所以如果你想拥有的column标题是freq0-10,等

 tally(group_by(df, x, y=cut(y,breaks=seq(0,40, by=10),
      labels=paste0("freq", c("0-10", "10-20", "20-30", "30-40")))))  %>%
                                                            ungroup() %>%
                                                            spread(y,n, fill=0) %>%
                                                            head(2)

  #   x freq0-10 freq10-20 freq20-30 freq30-40
  #1 a        0         1         1         0
  #2 b        1         1         0         0

Run Code Online (Sandbox Code Playgroud)

数据

 df <-  structure(list(x = structure(c(1L, 22L, 3L, 1L, 4L, 5L, 7L, 6L, 
 8L, 24L, 21L, 18L, 19L, 23L, 19L, 4L, 7L, 10L, 21L, 18L, 19L, 
 19L, 19L, 22L, 2L, 7L, 5L, 23L, 19L, 4L, 7L, 8L, 10L, 9L, 20L, 
 5L, 23L, 23L, 17L, 17L, 4L, 22L, 2L, 13L, 13L, 11L, 12L, 21L, 
 16L, 15L, 18L, 20L, 14L, 5L, 23L, 23L, 10L, 6L, 3L, 7L, 8L, 20L, 
 18L, 4L, 5L, 23L, 23L, 23L, 25L, 6L, 7L, 6L, 8L, 8L, 24L, 18L, 
 6L, 6L, 12L), .Label = c("a", "b", "c", "d", "e", "f", "g", "h", 
 "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", 
 "v", "w", "y", "z"), class = "factor"), y = c(12L, 9L, 29L, 21L, 
 27L, 37L, 12L, 31L, 33L, 11L, 25L, 15L, 27L, 27L, 13L, 37L, 8L, 
 2L, 21L, 6L, 4L, 23L, 30L, 6L, 9L, 28L, 4L, 24L, 26L, 2L, 13L, 
 10L, 15L, 6L, 38L, 9L, 30L, 26L, 28L, 39L, 19L, 16L, 11L, 9L, 
 2L, 4L, 16L, 15L, 11L, 14L, 19L, 35L, 19L, 29L, 22L, 40L, 19L, 
 12L, 7L, 6L, 20L, 10L, 12L, 6L, 30L, 13L, 38L, 39L, 30L, 20L, 
 6L, 9L, 1L, 40L, 26L, 14L, 23L, 33L, 2L)), .Names = c("x", "y"
 ), row.names = c(NA, -79L), class = "data.frame")

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，2 月前
查看次数：	2028 次
最近记录：	11 年，2 月前