分组和计数以获得贴近

Rhu*_*lsb 10 grouping r counting dataframe

我想每计数country次数的数量statusIS open和的次数statusclosed.然后计算closerate每个country.

数据:

customer <- c(1,2,3,4,5,6,7,8,9)
country <- c('BE', 'NL', 'NL','NL','BE','NL','BE','BE','NL')
closeday <- c('2017-08-23', '2017-08-05', '2017-08-22', '2017-08-26', 
'2017-08-25', '2017-08-13', '2017-08-30', '2017-08-05', '2017-08-23')
closeday <- as.Date(closeday)

df <- data.frame(customer,country,closeday)
Run Code Online (Sandbox Code Playgroud)

添加status:

df$status <- ifelse(df$closeday < '2017-08-20', 'open', 'closed') 

  customer country   closeday status
1        1      BE 2017-08-23 closed
2        2      NL 2017-08-05   open
3        3      NL 2017-08-22 closed
4        4      NL 2017-08-26 closed
5        5      BE 2017-08-25 closed
6        6      NL 2017-08-13   open
7        7      BE 2017-08-30 closed
8        8      BE 2017-08-05   open
9        9      NL 2017-08-23 closed
Run Code Online (Sandbox Code Playgroud)

计算 closerate

closerate <- length(which(df$status == 'closed')) / 
(length(which(df$status == 'closed')) + length(which(df$status == 'open')))

[1] 0.6666667
Run Code Online (Sandbox Code Playgroud)

显然,这是closerate总数.挑战是获得closerate每个country.我尝试将closerate计算添加到df:

df$closerate <- length(which(df$status == 'closed')) / 
(length(which(df$status == 'closed')) + length(which(df$status == 'open')))
Run Code Online (Sandbox Code Playgroud)

但是它给出了closerate0.66的所有行a ,因为我没有分组.我相信我不应该使用长度函数,因为计数可以通过分组来完成.我读了一些关于使用dplyr每组计算逻辑输出的信息,但这没有用.

这是所需的输出:

按国家分组

d.b*_*d.b 7

aggregate(list(output = df$status == "closed"),
          list(country = df$country),
          function(x)
              c(close = sum(x),
                open = length(x) - sum(x),
                rate = mean(x)))
#  country output.close output.open output.rate
#1      BE         3.00        1.00        0.75
#2      NL         3.00        2.00        0.60
Run Code Online (Sandbox Code Playgroud)

table在评论中使用的解决方案似乎已被删除.无论如何,你也可以使用table

output = as.data.frame.matrix(table(df$country, df$status))
output$closerate = output$closed/(output$closed + output$open)
output
#   closed open closerate
#BE      3    1      0.75
#NL      3    2      0.60
Run Code Online (Sandbox Code Playgroud)