我们可以访问自定义ggplot2的统计信息中的所有数据列吗?

mjk*_*tfw 5 r ggplot2 survival-analysis tidyverse ggproto

我想ggplot2通过创建新的stats函数和ggproto对象来实现Cox比例风险模型的诊断。这个想法是从分组(按colorfacet_grid等)中受益,以便对所需统计信息(例如mar残差)进行条件计算。在下面的示例中,将对空模型进行重新拟合,并为每个辅助因子水平计算mar。问题是:

  1. 飞度是基于完成data的内部设置的参数ggprotocompute_group方法,而不是实际的数据集。这意味着,数据帧被去除未明确定义的列,aes因此,用户应每次至少提供“事件发生时间”和“事件指示符”列。(除非它们在stat_...功能包装器中进行了硬编码)。可以compute_group以某种方式访问​​原始数据集中的对应行吗?
  2. 对于Rownames compute_groupdata是独一无二的每个PANEL,这意味着它们不能反映原始数据集的supplited实际rownames ggplot,我们再也不能排除如用完整的型号省略不完整的情况下,除非明确指定/硬编码一些id变量。问题与上面相同。
  3. geom图层能否访问y由另一个自定义统计信息(也已绘制)计算的?考虑使用诸如的更平滑的残差散点图。ggplot(data, aes(x = covariate, time = time, event = event)) + stat_funcform() + geom_smooth(aes(y = ..martingales..))..martingales..实际由来计算stat_funcform

# Load & Set data ---------------------------------------------------------

require(data.table)
require(dplyr)
require(ggplot2)
require(survival)
set.seed(1)

dt <- data.table(factor = sample(c(0, 2), 1e3, T),
                 factor2 = sample(c(0, 2, NA), 1e3, T),
                 covariate = runif(1e3, 0, 1))
dt[, `:=`(etime = rexp(1e3, .01 * (.5 + factor + covariate)),
          ctime = runif(1e3, 0, 1e2))]
dt[, `:=`(event = ifelse(etime < ctime, 1, 0),
          time = ifelse(etime < ctime, etime, ctime))]
survfit(Surv(time, event) ~ factor + I(covariate > .5), data = dt) %>% 
  autoplot(fun = 'cloglog')


# Fit coxph model ---------------------------------------------------------

fit <- coxph(Surv(time, event) ~ factor + covariate, data = dt)

# Define custom stat ------------------------------------------------------

StatFuncform <- ggproto(
  "StatFuncform", Stat,
  compute_group = function(data, scales, params) {
    head(data) %>% print
    data$y <- update(params$fit, ~ 1, data = data) %>% 
      resid(type = 'martingale')

    return(data)
  }
)

stat_funcform <- function(mapping = NULL, data = NULL, geom = "point",
                          position = "identity", na.rm = FALSE, show.legend = NA, 
                          inherit.aes = TRUE, ...) {
  layer(
    stat = StatFuncform, data = data, mapping = mapping, geom = geom, 
    position = position, show.legend = show.legend, inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}


# Plot --------------------------------------------------------------------

# computation unsuccessfull - no 'time' and 'event' variables    
ggplot(data = dt, 
       mapping = aes(x = covariate, 
                     color = factor(event))) + 
  stat_funcform(params = list(fit = fit)) + 
  facet_grid(~ factor, labeller = label_both, margins = TRUE)

# ok - 'time' and 'event' specified explicitly    
ggplot(data = dt, 
       mapping = aes(x = covariate, time = time, event = event, 
                     factor2 = factor2,
                     color = factor(event))) + 
  stat_funcform(params = list(fit = fit)) + 
  facet_grid(~ factor, labeller = label_both, margins = TRUE)
Run Code Online (Sandbox Code Playgroud)

我知道已经有实现生存分析(例如诊断了几包survminer)或ggplot2autoplot方法,但他们宁愿提供包装,而不是采取默认的优势ggplot2() + stat_语义。