如何在R中使用dcast计算唯一计数

Abh*_*bhi 5 transpose r dcast

我正在使用 dcast 转置下表

date               event          user_id
25-07-2020         Create          3455
25-07-2020         Visit           3567
25-07-2020         Visit           3567
25-07-2020         Add             3567
25-07-2020         Add             3678
25-07-2020         Add             3678
25-07-2020         Create          3567
24-07-2020         Edit            3871
Run Code Online (Sandbox Code Playgroud)

我正在使用 dcast 转置以将我的事件作为列并计算 user_id

dae_summ <- dcast(ahoy_events, date ~ event, value.var="user_id")

但我没有得到唯一的用户 ID。它多次计算相同的 user_id。我该怎么做才能让一个 user_id 在同一日期和事件中只计算一次。

akr*_*run 1

我们可以使用uniqueN来自data.table

library(data.table)
dcast(setDT(ahoy_events), date ~ event, fun.aggregate = uniqueN)
#         date Add Create Edit Visit
#1: 24-07-2020   0      0    1     0
#2: 25-07-2020   2      2    0     1
Run Code Online (Sandbox Code Playgroud)

或使用pivot_widerfromtidyrvalues_fn指定为n_distinct

library(tidyr)
library(dplyr)
ahoy_events %>%
   pivot_wider(names_from = event, values_from = user_id, 
      values_fn = list(user_id = n_distinct), values_fill = list(user_id = 0))
# A tibble: 2 x 5
#   date       Create Visit   Add  Edit
#  <chr>       <int> <int> <int> <int>
#1 25-07-2020      2     1     2     0
#2 24-07-2020      0     0     0     1
Run Code Online (Sandbox Code Playgroud)

数据

ahoy_events <- structure(list(date = c("25-07-2020", "25-07-2020", "25-07-2020", 
"25-07-2020", "25-07-2020", "25-07-2020", "25-07-2020", "24-07-2020"
), event = c("Create", "Visit", "Visit", "Add", "Add", "Add", 
"Create", "Edit"), user_id = c(3455L, 3567L, 3567L, 3567L, 3678L, 
3678L, 3567L, 3871L)), class = "data.frame", row.names = c(NA, 
-8L))
Run Code Online (Sandbox Code Playgroud)