我正在运行一个ddply函数并继续出错.
data.frame的结构:
str(visits.by.user)
'data.frame': 80317 obs. of 5 variables:
$ ClientID : Factor w/ 147792 levels "50912733","50098716",..: 1 3 4 5 6 7 8 10 11 12 ...
$ TotalVisits : int 64 231 18 21 416 290 3 13 1 7 ...
$ TotalDayVisits: int 8 141 0 4 240 155 0 0 0 0 ...
$ TotalNightVisits: int 56 90 18 17 176 135 3 13 1 7 ...
$ quintile : Factor w/ 5 levels "0-20","20-40",..: 5 5 4 4 5 5 2 4 1 3 ...
Run Code Online (Sandbox Code Playgroud)
旁注: 我知道如何为随机数字数据创建样本数据 - 如何应用5个级别的因子来构建代表性样本?
ddply代码:
summary.users <- ddply(data = subset(visits.by.user, TotalVisits > 0),
.(quintile, TotalDayVisits, TotalNightVisits),
summarize,
NumClients = length(ClientID))
Run Code Online (Sandbox Code Playgroud)
错误信息:
Error in if (empty(.data)) return(.data) :
missing value where TRUE/FALSE needed
Run Code Online (Sandbox Code Playgroud)
我认为可能ddply需要我试图分组的变量是一个因素,所以我尝试了一个as.factor整数变量,但这不起作用.
谁能看到我哪里出错?
编辑:添加顶部 dput
structure(list(ClientID = structure(c(1L, 2L, 3L, 4L, 5L, 6L), .Label = c("50912733", "60098716", "50087112", "94752212", "78217771", "12884545"), class = "factor"),TotalVisits = c(80L, 92L, 103L, 18L, 182L, 136L), TotalDayVisits = c(56L, 90L, 18L, 17L, 176L, 135L), TotalNightVisits = c(24L, 2L, 85L, 1L, 6L, 1L), quintile = structure(c(5L, 5L, 4L, 4L, 5L, 5L), .Label = c("0-20", "20-40", "40-60", "60-80", "80-100"), class = "factor")), .Names = c("ClientID", "TotalVisits", "TotalDayVisits", "TotalNightVisits", "quintile"), row.names = c(NA,6L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
你的第一个参数被命名为data=同时ddply有一个名为第一个参数.data.如果我改变了这一点,你的代码运行正常.
关于我的评论,这是一个我认为过去遇到过的问题,但似乎有一个隐含的调用类似于机制droplevels内部的东西ddply.我希望能够更深入地了解它的工作原理!
dat <- data.frame(x=1:20, z=factor(rep(letters[1:4], each=5)))
ddply(dat, .(z), summarise, length(x))
z ..1
1 a 5
2 b 5
3 c 5
4 d 5
ddply(subset(dat, z!='a'), .(z), summarise, length(x))
z ..1
1 b 5
2 c 5
3 d 5
Run Code Online (Sandbox Code Playgroud)
哪个表现得很好.然而,考虑因素水平让我感到惊讶:
ddply(subset(dat, z!='a'), .(z), summarise, paste(levels(z), collapse=' '))
z ..1
1 b a b c d
2 c a b c d
3 d a b c d
Run Code Online (Sandbox Code Playgroud)