TMC*_*TMC 11 plot r ggplot2 gtable facet-grid
I want to draw lines between a faceted ggplot. The main goal is to connect those measurements with a line which we want to test against. So basically I want to insert some kind of significance bars inside and between the facets of a ggplot boxplot (or any kind of plot for that matter).
I know that there is ggsignif package which does this for all non faceted plots.
There are answers which try to circumvent this drawback Using ggsignif with grouped bar graphs and facet_wrap not working.
There is an approach which could be modified for my purpose but a major drawback with the solution of this question ggplot, drawing line between points across facets that one has to specify the lparameter of gtable_add_grob manually. I was not able to figure out how one could automate the l parameter using only the name of the facet panel we want to start end end with. Since $layout$name only hold some arbitrary names like "panel-1-1" which is the actual panel name but how would one get from that to the l parameter which is needed to specify the start and end of the line
我想自动化在分面图之间绘制线条的过程,有关更多信息,请参阅我关于多个分组变量的有效比较的帖子。最后,我想用它来自动注释图,以可视化哪些是有效的比较,并且还可能像 ggsignif 一样向图添加显着性条,但具有多面图。
这就是我们正在处理的数据:
# Create a dummy dataframe
# Create a dummy dataframe
df <- expand.grid(
St= 1:10,
MAT= c("A", "B", "C"),
TREAT= factor(1:2)
)
df$St<- rnorm(nrow(df))
df$OPERATOR<- rep(c("TM", "CX"), each = 5, length.out = nrow(df))
# numbers are randomly generated, so this is different each time
head(df)
Run Code Online (Sandbox Code Playgroud)
| 英石 | 垫 | 对待 | 操作员 | |
|---|---|---|---|---|
| 1 | -0.488805635 | A | 1 | TM值 |
| 2 | 2.658658027 | A | 1 | TM值 |
| 3 | 1.680278205 | A | 1 | TM值 |
| 4 | 0.779584009 | A | 1 | TM值 |
| 5 | 0.713240520 | A | 1 | TM值 |
| 6 | -0.542881937 | A | 1 | CX |
结果如下:
# ggplot with multiple facets (nested)
p <- ggplot(data = df,
aes(x = TREAT,
y = St,
color = MAT))+
geom_boxplot() +
ggh4x::facet_nested(~ MAT + OPERATOR) +
theme_classic()
p
Run Code Online (Sandbox Code Playgroud)
我想像这样在面之间画线。能够在面(蓝色)甚至子面(绿色)内以及不同面(黑色)之间绘制线条。此处的 y 位置是任意选择的,但应与此处的顺序类似。情节是用 inkscape 制作的。
正如您所看到的,我们有一个简单的facet_nested箱线图,其中一些数据点之间有线条,以演示我想要在哪些数据之间绘制水平线。
使用谷歌和一些法学硕士(chatGPT 或 Bing),我能够创建一些代码来自动选择方面和子方面。
# Build the plot
gb <- ggplot_build(p)
# Get panel parameters
ranges <- gb$layout$panel_params
# Get npc position of a specific facet and subfacet
mat_name1 <- "A" # first facet
operator_name1 <- "TM" # first subfacet
# vs
mat_name2 <- "B" # first facet
operator_name2 <- "TM" #first subfacet
# x axis name
x_name <- "1"
# Find the index of the panel that corresponds to the specified facet and subfacet
# TO DO change MAT and OPERATOR with strings so it is adaptable to the grouping column names
panel_index1 <- which(gb$layout$layout$MAT == mat_name1 & gb$layout$layout$OPERATOR == operator_name1)
panel_index2 <- which(gb$layout$layout$MAT == mat_name2 & gb$layout$layout$OPERATOR == operator_name2)
Run Code Online (Sandbox Code Playgroud)
这将返回面或子面面板编号的正确编号(在此 ca 中为 1 和 3)。但我无法提取 A-TM-1 与 B-TM-1(尤其是 1)的确切坐标。
此外,我不知道如何使用该信息在图上绘制线条。
我希望有人能够理解我想要实现的目标,并帮助我理解 ggplot_build 信息的逻辑以提取位置以及如何使用该信息在图上绘制线条。理想情况下,我可以编写一个函数,该函数采用一组面、子面组合,包括 x 轴位置(在本例中为 1 或 2 或任何其他 x 轴标签)来在框之间绘制这些线。(例如c(c('A', 'TM','1'), c('B', 'TM','1'))
,但是更全面地了解如何解释 ggplot_build 的输出以及这些图的构建将是美妙的。
最有希望的是 ggplot 的方法,在跨面的点之间绘制线条,但我在自动选择正确的l参数方面遇到了困难。
如果有人能为我指明正确的方向,那将是一个很大的帮助,因为这种图是日常业务,如果我可以像这样自动化该过程,将节省大量时间和错误ggsignif。
如果您需要任何其他信息,请随时询问。
最好的
TMC
All*_*ron 12
可以画出连接面的线;事实上,有几种方法可以做到这一点,但没有一种方法是容易的。如果我这样做,我希望结果是一个 ggplot 对象,而不是一个在其上绘制线条的 ggplot (这本身也可能有几种不同的方式)。
任何以普通 ggplot 对象结尾的方法都必须clip = "off"在其坐标中进行设置,以允许线条在面板之间拉伸。由于分面面板是按顺序绘制到页面上的,因此我们还必须将 转换panel.background为element_blank(). 任何网格线或垂直轴线都将绘制在您的小面跨越线上,但幸运的是您选择的主题非常适合此目的。
另一个问题是您需要设置硬坐标限制,否则坐标将简单地扩展以适应您的线条。这实际上意味着您需要在每个方面都存在所有因子级别,并且不能使用scales = "free_x". 再次强调,这不是您的设置问题。
如果用 画线geom_segment,那么很容易指定线左边的起点;主要困难在于找到右侧的 x 值。这需要针对每条线进行计算。实际上,您需要问“如果此面板的 x 轴是数字并无限延伸,我希望该行以什么 x 值结束?”。
您可以指定一个函数来为您解决这个问题,返回一个您可以输入的数据框geom_segment(或者geom_textsegment如果您想要标签)
xpos <- function(data, fac1, fac2, xvar, yvals, labels) {
get_xpos <- function(data, fac1, fac2, xvar) {
datafac <- list(xvar = data[[names(xvar)[1]]],
fac1 = data[[names(fac1)[1]]],
fac2 = data[[names(fac2)[1]]])
datafac <- lapply(datafac, as.factor)
datalevs <- lapply(datafac, levels)
datanum <- lapply(datalevs, function(x) as.numeric(factor(x)))
datanum[-1] <- lapply(datanum[-1], function(x) x - 1)
datanum$fac2 <- (max(datanum$xvar) + 1/3) * datanum$fac2
datanum$fac1 <- (max(datanum$xvar + 1/3) + max(datanum$fac2)) * datanum$fac1
levs <- Map(match, list(unlist(xvar), unlist(fac1), unlist(fac2)), datalevs)
final_vals <- Map(function(x, i) x[i], datanum, levs)
facet_add <- final_vals$fac1 + final_vals$fac2
facet_add[2] - facet_add[1] + final_vals$xvar[2]
}
names(fac1[[1]]) <- rep(names(fac1), length(fac1[[1]]))
names(fac2[[1]]) <- rep(names(fac2), length(fac2[[1]]))
names(xvar[[1]]) <- rep(names(xvar), length(xvar[[1]]))
x <- sapply(seq_along(xvar[[1]]), function(i) {
get_xpos(data, fac1[[1]][i], fac2[[1]][i], xvar[[1]][i])
})
d <- data.frame(sapply(fac1[[1]], `[`, 1), sapply(fac2[[1]], `[`, 1),
sapply(xvar[[1]], `[`, 1), x, yvals[[1]], labels)
setNames(d, c(names(fac1), names(fac2), names(xvar),
"xpos", names(yvals), "labels"))
}
Run Code Online (Sandbox Code Playgroud)
调用这个函数仍然需要一些工作,因为我们需要向它提供每个分面和 x 轴变量的开始和结束级别的列表:
segs <- xpos(data = df,
xvar = list(TREAT = list(c(1, 2), c(1, 1),
c(1, 1), c(1, 1))),
fac1 = list(MAT = list(c("A", "A"), c("A", "A"),
c("A", "B"), c("A", "C"))),
fac2 = list(OPERATOR = list(c("CX", "CX"), c("CX", "TM"),
c("CX", "CX"), c("TM", "TM"))),
yvals = list(St = c(1.5, 1.8, 2.1, 2.4)),
labels = c("Label 1", "Label 2", "Label 3", "Label 4"))
Run Code Online (Sandbox Code Playgroud)
但至少我们最终的绘图代码很简单:
library(geomtextpath)
ggplot(data = df, aes(x = TREAT, y = St, color = MAT)) +
geom_boxplot() +
geom_textsegment(data = segs,
aes(xend = xpos, yend = St, group = MAT, label = labels),
color = c("green4", "blue", "black", "black"),
linewidth = 1, vjust = -0.2) +
ggh4x::facet_nested(~ MAT + OPERATOR) +
coord_cartesian(clip = "off", xlim = c(1, 2)) +
theme_classic() +
theme(panel.background = element_blank())
Run Code Online (Sandbox Code Playgroud)
当然,这一切都有点麻烦。可以对其进行调整,使其具有更易于使用的界面,但上述所有警告让我想知道这是值得的。这实际上取决于您打算如何使用它。
我不确定是否建议将这些组绘制在不同的方面,同时它们仍位于同一图中(假设您可能只对所有比较进行了一项统计测试)。这是完成类似工作的一种稍微更简洁的方法,但它不涉及将数据分成各个方面。
这样做的一个明显优点是它是全自动(半自动)的。我还提供了如何进行有效比较的建议。(但它假设每个分组变量中使用的分类名称是不同的。)
第 1 步是重做示例数据并加载必要的库:
library(tidyverse)
library(broom)
library(geomtextpath)
library(ggh4x) # I added this for nested axis label
# Create a dummy dataframe
# Create a dummy dataframe
set.seed(2)
df <- expand.grid(
St= 1:10,
MAT= c("A", "B", "C"),
TREAT= factor(1:2)
)
df$St<- rnorm(nrow(df))
df$OPERATOR<- rep(c("TM", "CX"), each = 5, length.out = nrow(df))
Run Code Online (Sandbox Code Playgroud)
第 2 步,您需要定义分组变量的顺序(您可以使用这个。
current_scheme <- levels(interaction(unique(df$TREAT), unique(df$OPERATOR), unique(df$MAT)))
Run Code Online (Sandbox Code Playgroud)
第三步是进行统计检验。在这里,我做了方差分析作为例子。请谨慎使用。例如,统计学家可能会建议您在进行 Tukey 检验之前进行逐步受保护的方差分析...
test_df <- df %>%
mutate(global_y_max = max(St)) %>% # this can be change to per group if desire, but would not automatically guarantee not to overlap the data
ungroup() %>% ## make sure to get one tibble for ANOVA for each y max
group_by(global_y_max) %>%
group_modify(~ broom::tidy(TukeyHSD(aov(St ~ as.factor(TREAT) * as.factor(OPERATOR) * as.factor(MAT), data = .x)))) %>% # the variable order should be the same as the current_scheme
filter(str_count(term,":") == 2) %>% # only interested in pair-wise comparisons
dplyr::select(contrast, p = adj.p.value) %>%
separate(contrast, into = c("first", "second"), sep = "-", remove = FALSE) %>%
rowwise() %>%
mutate(firstlist = strsplit(first, ":"),
second_list = strsplit(second, ":")) %>%
mutate(valid_comparison = length(setdiff(unlist(firstlist), unlist(second_list))) == 1) %>%
filter(valid_comparison == TRUE) %>%
mutate(first_xpos = which(!!current_scheme == gsub(":", ".", first))) %>%
mutate(second_xpos = which(!!current_scheme == gsub(":", ".", second))) %>%
filter(p < 0.97) %>% # remove this line or change to p< 0.05?
mutate(sig = signif(p, digits = 3)) %>% # can change to * if preferred
ungroup() %>%
group_by(global_y_max) %>%
mutate(current_test = row_number())
Run Code Online (Sandbox Code Playgroud)
第 4 步是绘制结果。
ggplot()+
geom_boxplot(data = df,
aes(x = interaction(TREAT, OPERATOR, MAT), # need to be the same as current_scheme
y = St,
color = MAT)) +
geom_textsegment(data = test_df,
aes(x = second_xpos,
xend = first_xpos,
y = global_y_max + current_test*global_y_max/10, # the factor "10" can be change
yend = global_y_max + current_test*global_y_max/10,
group = current_test, label = sig),
vjust = -0.2, size = 3) + # adjust this when needed
# facet_wrap(~ MAT + OPERATOR, nrow = 1) + ## Do not do facet here
theme_classic() +
guides(x = "axis_nested") # added this for nested axis label
Run Code Online (Sandbox Code Playgroud)