Mic*_*use 2 group-by r function dplyr
下面我有一个关于我想要函数做什么的工作示例,然后是函数的脚本,注意错误发生的位置.
错误消息是:
Error: index out of bounds
Run Code Online (Sandbox Code Playgroud)
我所知道的通常意味着R无法找到被调用的变量.
有趣的是,在我下面的函数示例中,如果我只通过my subgroup_name(它传递给函数并成为新创建的数据框中的列),该函数将成功重新组合该变量,但我还想按新创建的列进行分组(来自融化)称为变量.
类似的代码用于我使用regroup(),但已被弃用.我试图使用group_by_()但无济于事.
我已阅读了许多其他帖子和答案,并在今天进行了几个小时的实验,但仍未成功.
# Initialize example dataset
database <- ggplot2::diamonds
database$diamond <- row.names(diamonds) # needed for melting
subgroup_name <- "cut" # can replace with "color" or "clarity"
subgroup_column <- 2 # can replace with 3 for color, 4 for clarity
# This works, although it would be preferable not to need separate variables for subgroup_name and subgroup_column number
df <- database %>%
select(diamond, subgroup_column, x,y,z) %>%
melt(id.vars=c("diamond", subgroup_name)) %>%
group_by(cut, variable) %>%
summarise(value = round(mean(value, na.rm = TRUE),2))
# This does not work, I am expecting the same output as above
subgroup_analysis <- function(database,...){
df <- database %>%
select(diamond, subgroup_column, x,y,z) %>%
melt(id.vars=c("diamond", subgroup_name)) %>%
group_by_(subgroup_name, variable) %>% # problem appears to be with finding "variable"
summarise(value = round(mean(value, na.rm = TRUE),2))
print(df)
}
subgroup_analysis(database, subgroup_column, subgroup_name)
Run Code Online (Sandbox Code Playgroud)
来自NSE的小插曲:
如果您还想输出要变化的变量,则需要将带引号的对象列表传递给.dots参数:
在这里,variable应该引用:
subgroup_analysis <- function(database,...){
df <- database %>%
select(diamond, subgroup_column, x,y,z) %>%
melt(id.vars=c("diamond", subgroup_name)) %>%
group_by_(subgroup_name, quote(variable)) %>%
summarise(value = round(mean(value, na.rm = TRUE),2))
print(df)
}
subgroup_analysis(database, subgroup_column, subgroup_name)
Run Code Online (Sandbox Code Playgroud)
正如@RichardScriven所提到的,如果你打算将结果分配给一个新变量,那么你可能想要print在结尾处删除调用而只是写入df,或者甚至根本不在df函数中分配
否则即使你这样做,结果也会打印出来 x <- subgroup_analysis(...)