索引grouped_df对象

Question

索引grouped_df对象

尝试grouped_df通过索引选择类对象的列给出"错误:索引超出范围".例如

x <- mtcars %>% group_by(am, gear) %>% summarise_each(funs(sum), disp, hp, drat)
class(x)
#    "grouped_df" "tbl_df"     "tbl"        "data.frame"
# For some reason the first column can be selected...
x[1]
#    Source: local data frame [4 x 1]
#    Groups: am
#    am
#     0
#     0
#     1
#     1    
# ...but any index > 1 fails
x[2] 
#   Error: index out of bounds
# Coercing to data frame does the trick...
as.data.frame(x)[2]
#   gear
#      3
#      4
#      4
#      5
#... and so does ungrouping
all(ungroup(x)[2] == as.data.frame(x)[2]) # TRUE

Run Code Online (Sandbox Code Playgroud)

这是使用R版本3.1.1和dplyr 0.3.0.2.我不确定这是一个bug还是故意的..有没有什么好的理由让它以这种方式工作？我宁愿不必记得在dplyr每次使用后取消组合我的数据帧...

更新看了一下这个,我的猜测是定义[.grouped_df 这种方式的动机是在调用eg x[1:3](哪个有效)时保留组.但是,当索引不是分组变量的一部分时,将抛出上述错误.也许它可以被修改,以便在这种情况下它同时调用[.tbl_df并发出警告......

[.grouped_df在dplyr(0.3.0.9000)的开发版本中修改了更新2.它仍然会抛出错误,但现在更清楚,指定未包含哪些分组变量.

x[2]
# Error in `[.grouped_df`(x, 2) : 
#     cannot group, grouping variables 'am' not included

Run Code Online (Sandbox Code Playgroud)

我发现的最佳解决方案是在代码链%>% ungroup的末尾包含我的代码在这种情况下不会崩溃dplyr.

Answer 1

小智 0

对于group_by，除了分组变量之外，函数[不能对 df 的列进行子集化。请参阅问题的详细信息，

归档时间：	11 年，1 月前
查看次数：	1080 次
最近记录：	11 年前