在箱线图中标记一个点

use*_*846 2 visualization r ggplot2

我使用 ggplot2 在一页中将三个不同的集合绘制为三个箱线图。在每组中,我想强调一个点,并说明该点与其他点相比的位置,它是否在方框内?或外面。

这是我的数据点

    CDH     1KG     NHLBI
CDH 301     688     1762
RS0 204     560     21742
RS1 158     1169    1406
RS2 182     1945    1467
RS3 256     2371    1631
RS4 198     580     1765
RS5 193     524     1429
RS6 139     2551    1469
RS7 188     702     1584
RS8 142     4311    1461
RS9 223     916     1591
RS10 250    794     1406
RS11 185    539     1270
RS12 228    641     1786
RS13 152    557     1677
RS14 225    1970    1619
RS15 196    458     1543
RS16 203    2891    1528
RS17 221    1542    1780
RS18 258    1173    1850
RS19 202    718     1651
RS20 191    6314    1564


library(ggplot2) 
rm(list = ls())
orig_table = read.table("thedata.csv", header = T, sep = ",")
bb = orig_table # have copy of the data
bb = bb[,-1] # since these points, the ones in the first raw are my interesting point, I exclude them from the sets for the time being
tt = bb
mydata = cbind(c(tt[,1], tt[,2], tt[,3]), c(rep(1,22),rep(2,22),rep(3,22))) # I form the dataframe
data2 = cbind(c(301,688,1762),c(1,2,3)) # here is the points that I want to highlight - similar to the first raw
colnames(data2) = c("num","gro")
data2 = as.data.frame(data2) # I form them as a dataframe 

colnames(mydata) = c("num","gro")
mydata = as.data.frame(mydata)
mydata$gro = factor(mydata$gro, levels=c(1,2,3))
qplot(gro, num, data=mydata, geom=c("boxplot"))+scale_y_log10() # I am making the dataframe out of 21 other ponts
# and here I want to highlight those three values in the "data2" dataframe
Run Code Online (Sandbox Code Playgroud)

我感谢您的帮助

Bro*_*ieG 5

首先,如果您使用长格式的数据,ggplot 会更容易使用。 melt来自reshape2帮助:

library(reshape2)
library(ggplot2)
df$highlight <- c(TRUE, rep(FALSE, nrow(df) - 1L))  # tag first row as interesting
df.2 <- melt(df)  # convert df to long format
ggplot(subset(df.2, !highlight), aes(x=variable, y=value)) + 
  geom_boxplot() + scale_y_log10() +
  geom_point(                               # add the highlight points
    data=subset(df.2, highlight), 
    aes(x=variable, y=value), 
    color="red", size=5
  )
Run Code Online (Sandbox Code Playgroud)

现在,我所做的就是在第一行添加一个 TRUE,融化数据以与 ggplot 兼容,并除了箱线图之外还用highlight==TRUE 绘制点。

在此输入图像描述

编辑:这就是我制作数据的方式:

df <- read.table(text="    CDH     1KG     NHLBI
CDH 301     688     1762
RS0 204     560     21742
RS1 158     1169    1406
RS2 182     1945    1467
RS3 256     2371    1631
RS4 198     580     1765
RS5 193     524     1429
RS6 139     2551    1469
RS7 188     702     1584
RS8 142     4311    1461
RS9 223     916     1591
RS10 250    794     1406
RS11 185    539     1270
RS12 228    641     1786
RS13 152    557     1677
RS14 225    1970    1619
RS15 196    458     1543
RS16 203    2891    1528
RS17 221    1542    1780
RS18 258    1173    1850
RS19 202    718     1651
RS20 191    6314    1564", header=T)
Run Code Online (Sandbox Code Playgroud)