Col*_*in9 2 aggregate r ggplot2 dplyr
我试图在函数中使用dplyr创建一个用户定义的函数,我可以传递多个参数来使用dplyr汇总数据,然后用ggplot绘制它.
这是一些示例数据以及我正在尝试使用dplyr然后绘制的内容
df <-data.frame(Year = c("2006", "2006", "2006", "2007", "2007", "2007", "2008", "2009", "2010", "2010", "2009", "2009"), JudicialOrientation = c("Defense", "Plaintiff", "Plaintiff", "Neutral", "Defense", "Plaintiff", "Defense", "Plaintiff", "Neutral", "Neutral", "Plaintiff","Defense"), Loss = c(100000, 100, 2500, 100000, 25000, 0, 7500, 5200, 900, 100, 0, 50))
df1 <- df %>%
group_by(Year, JudicialOrientation) %>%
summarise(MeanLoss =mean(Loss))
ggplot(df1, aes(x = JudicialOrientation, y = MeanLoss, color = Year, group =Year)) +
geom_line() +
geom_point()
Run Code Online (Sandbox Code Playgroud)
我现在正在尝试将其复制到用户函数中,以便我可以传递不同的变量来获得类似的结果.
这是我到目前为止的尝试:
ConsistencyPlot <- function(df,var1,timevar,lossvar){
df1 <- df %>%
group_by_(df[timevar], df[var1]) %>%
summarise_(MeanLoss = mean(df[lossvar]))
ggplot(df1, aes(x = var1, y = MeanLoss, color = timevar, group = timevar)) +
geom_line() +
geom_point()
}
ConsistencyPlot(df,"JudicialOrientation","Year",'Loss')
Run Code Online (Sandbox Code Playgroud)
我复制相同的逻辑,并传递df我的数据帧,var1如JudicialOrientation,timevar作为Year和lossvar作为我的载体Loss,我想通过平均值summarise.我无法得到相同的结果,所以我觉得我错过了关于如何在闭包中使用这些函数的东西.
首先,在dplyr函数内部,您不需要调用索引数据帧的变量df[, timevar].仅使用变量名称.除此之外,在索引数据帧时,您必须指定是调用列还是行,所以这df[timevar]是错误的.
关于功能,这是一个评估问题.
以下结构正在起作用:
ConsistencyPlot <- function(df, var1, timevar, lossvar){
var1 <- enquo(var1)
timevar <- enquo(timevar)
lossvar <- enquo(lossvar)
df1 <- df %>%
group_by(!!timevar, !!var1) %>%
summarise(MeanLoss = mean(!!lossvar))
ggplot(df1, aes(x = !!var1, y = MeanLoss, color = !!timevar, group = !!timevar)) +
geom_line() +
geom_point()
}
Run Code Online (Sandbox Code Playgroud)
看看参数是用变换的enquo(),然后传递给函数使用!!.因此,您可以在不引用它们的情况下传递参数.
ConsistencyPlot(df, JudicialOrientation, Year, Loss)
Run Code Online (Sandbox Code Playgroud)
希望对你有帮助.