让我们说我有这个数据框架
Date DayOfWeek Url Hits
09/01/2016 Thursday url1 3
09/01/2016 Thursday url2 5
09/01/2016 Thursday url3 4
09/02/2016 Friday url1 7
09/02/2016 Friday url3 6
09/03/2016 Saturday url2 9
09/03/2016 Saturday url1 5
09/04/2016 Sunday url2 6
09/07/2016 Wednesday url10 4
09/07/2016 Thursday url2 3
09/07/2016 Thursday url4 2
09/07/2016 Thursday url5 3
09/07/2016 Thursday url1 3
09/08/2016 Friday url1 3
09/08/2016 Friday url4 3
09/08/2016 Friday url5 2
09/08/2016 Friday url8 6
09/09/2016 Saturday url2 1
09/09/2016 Saturday url3 2
09/09/2016 Saturday url5 4
09/09/2016 Saturday url1 8
09/14/2016 Thursday url1 3
09/147/2016 Thursday url2 2
09/14/2016 Thursday url3 3
Run Code Online (Sandbox Code Playgroud)
我希望在访问的唯一网址数量方面获得本周最忙碌的一天.例如,在数据框中有3个星期四,第一个星期四有3个唯一的网址访问,第二个星期四有4个,最后一个星期四有3个......我打算做的是,总和网址= 3 + 4 + 3 /(周四的数量= 3)=这一天的大量独特网址....
对于星期五,第一个将是2个网址,然后是第二个,有4个,计算将是2 + 4 /数据集中的星期五数量= 2
我正试图通过dplyr来解决这个问题.我正在尝试使用group_by,但我似乎无法确定正确的功能组合以达到我需要的效果.
我们得到每个'Date'和'DayOfWeek'(n_distinct)的不同'Url'('N')的数量,并获得mean每个'DayofWeek'的'N'.
library(dplyr)
df1 %>%
group_by(Date, DayOfWeek) %>%
summarise(N = n_distinct(Url)) %>%
group_by(DayOfWeek) %>%
summarise(N = mean(N))
# DayOfWeek N
# <chr> <dbl>
#1 Friday 3.000000
#2 Saturday 3.000000
#3 Sunday 1.000000
#4 Thursday 3.333333
#5 Wednesday 1.000000
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
193 次 |
| 最近记录: |