我正在努力找到一个合适的展示来说明学校课程内和跨学校的各种属性.每个班级只有15-30个数据点(学生).
现在我倾向于一个没有胡须的箱子图,只显示1.,2.3.四分位数+数据点更多,例如1个群体SD +/-样本中位数.
我能这样做.
但是 - 我需要向一些教师展示这个图表,以便衡量他们最喜欢什么.我想将我的图表与普通的箱线图进行比较.但是,如果只有一个异常值,或者例如相同值的5个异常值,则正常的箱线图看起来相同.在这种情况下,这将是一个交易破坏者.
例如
test <-structure(list(value = c(3, 5, 3, 3, 6, 4, 5, 4, 6, 4, 6, 4,
4, 6, 5, 3, 3, 4, 4, 4, 3, 4, 4, 4, 3, 4, 5, 6, 6, 4, 3, 5, 4,
6, 5, 6, 4, 5, 5, 3, 4, 4, 6, 4, 4, 5, 5, 3, 4, 5, 8, 8, 8, 8,
9, 6, 6, 7, 6, 9), places = structure(c(1L, 2L, 1L, 1L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L,
2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L,
2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L,
1L, 2L, 2L, 1L, 2L, 1L), .Label = c("a", "b"), class = "factor")), .Names = c("value",
"places"), row.names = c(NA, -60L), class = "data.frame")
ggplot(test, aes(x=places,y=value))+geom_boxplot()
Run Code Online (Sandbox Code Playgroud)
这里有两个异常值("a",9) - 但只显示了一个"点".
所以我的问题是:如何抖动异常值.而且 - 你会为这种数据建议什么样的显示?
你可以重新定义这些功能
GeomBoxplot$draw<-function (., data, ..., outlier.colour = "black", outlier.shape = 16,
outlier.size = 2, outlier.jitter=0)
{
defaults <- with(data, data.frame(x = x, xmin = xmin, xmax = xmax,
colour = colour, size = size, linetype = 1, group = 1,
alpha = 1, fill = alpha(fill, alpha), stringsAsFactors = FALSE))
defaults2 <- defaults[c(1, 1), ]
if (!is.null(data$outliers) && length(data$outliers[[1]] >=
1)) {
pp<-position_jitter(width=outlier.jitter,height=0)
p<-pp$adjust(data.frame(x=data$x[rep(1, length(data$outliers[[1]]))], y=data$outliers[[1]]),.scale)
outliers_grob <- GeomPoint$draw(data.frame(x=p$x, y = p$y, colour = I(outlier.colour),
shape = outlier.shape, alpha = 1, size = outlier.size,
fill = NA), ...)
}
else {
outliers_grob <- NULL
}
with(data, ggname(.$my_name(), grobTree(outliers_grob, GeomPath$draw(data.frame(y = c(upper,
ymax), defaults2), ...), GeomPath$draw(data.frame(y = c(lower,
ymin), defaults2), ...), GeomRect$draw(data.frame(ymax = upper,
ymin = lower, defaults), ...), GeomRect$draw(data.frame(ymax = middle,
ymin = middle, defaults), ...))))
}
ggplot(test, aes(x=places,y=value))+geom_boxplot(outlier.jitter=0.05)
Run Code Online (Sandbox Code Playgroud)
这是临时解决方案.当然,就OOP而言,您应该创建一个GeomBoxplot的子类并覆盖该函数.这很简单,因为ggplot2很不错.
===添加了例如子类定义===
GeomBoxplotJitterOutlier <- proto(GeomBoxplot, {
draw <- function (., data, ..., outlier.colour = "black", outlier.shape = 16,
outlier.size = 2, outlier.jitter=0) {
# copy the body of function 'draw' above and paste here.
}
objname <- "boxplot_jitter_outlier"
desc <- "Box and whiskers plot with jittered outlier"
guide_geom <- function(.) "boxplot_jitter_outlier"
})
geom_boxplot_jitter_outlier <- GeomBoxplotJitterOutlier$build_accessor()
Run Code Online (Sandbox Code Playgroud)
然后你可以用你的子类做:
ggplot(test, aes(x=places,y=value))+geom_boxplot_jitter_outlier(outlier.jitter=0.05)
Run Code Online (Sandbox Code Playgroud)
似乎已接受的答案不再适用,因为ggplot2已更新.经过网上搜索后,我发现了以下内容:http: //comments.gmane.org/gmane.comp.lang.r.ggplot2/3616-看看Winston Chang的回复 -
他使用ddply分别计算异常值,然后使用它们绘制它们
geom_dotplot()
Run Code Online (Sandbox Code Playgroud)
已禁用geom_boxplot()上的异常值输出:
geom_boxplot(outlier.colour = NA)
Run Code Online (Sandbox Code Playgroud)
以下是上述网址的完整代码:
# This returns a data frame with the outliers only
find_outliers <- function(y, coef = 1.5) {
qs <- c(0, 0.25, 0.5, 0.75, 1)
stats <- as.numeric(quantile(y, qs))
iqr <- diff(stats[c(2, 4)])
outliers <- y < (stats[2] - coef * iqr) | y > (stats[4] + coef * iqr)
return(y[outliers])
}
library(MASS) # Use the birthwt data set from MASS
# Find the outliers for each level of 'smoke'
library(plyr)
outlier_data <- ddply(birthwt, .(smoke), summarise, lwt = find_outliers(lwt))
# This draws an ordinary box plot
ggplot(birthwt, aes(x = factor(smoke), y = lwt)) + geom_boxplot()
# This draws the outliers using geom_dotplot
ggplot(birthwt, aes(x = factor(smoke), y = lwt)) +
geom_boxplot(outlier.colour = NA) +
#also consider:
# geom_jitter(alpha = 0.5, size = 2)+
geom_dotplot(data = outlier_data, binaxis = "y",
stackdir = "center", binwidth = 4)
Run Code Online (Sandbox Code Playgroud)