从后验预测分布中采样(stan vs inla)

use*_*230 5 r mixed-models rstan rstanarm r-inla

我正在尝试在对象bayesplot上实现包中的函数INLA,并且有点不确定如何从后验预测分布中提取。我想我几乎已经拿到了,但是rstan 抽签比抽签更加多变INLA

在 中,使用小插图rstan中的简化示例我可以:bayesplot

library(bayesplot)
library(ggplot2)
library(rstanarm)
library(ggpubr)
library(tidyverse)


#rstan model set up
roaches$roach100 <- roaches$roach1 / 100 # pre-treatment number of roaches (in 100s)
fit_poisson <- stan_glm(y ~ roach100 + treatment + senior, offset = log(exposure2), family = poisson(link = "log"), data = roaches,  seed = 1111,  refresh = 0) 


#In order to use the PPC functions from the bayesplot package we need a vector y of outcome values:
y <- roaches$y
#and a matrix yrep of draws from the posterior predictive distribution,
yrep_poisson <- posterior_predict(fit_poisson, draws = 500)
#then plot:
p1 <- bayesplot::ppc_dens_overlay(y, yrep_poisson[1:50, ])
p1
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

我想在一个INLA对象上复制该图。根据bayesplot小插图,您可以执行此操作,因为他们提供了代码来定义一个简单的pp_check方法来创建类的拟合模型对象,例如foo

pp_check.foo <- function(object, type = c("multiple", "overlaid"), ...) {
  type <- match.arg(type)
  y <- object[["y"]]
  yrep <- object[["yrep"]]
  stopifnot(nrow(yrep) >= 50)
  samp <- sample(nrow(yrep), size = ifelse(type == "overlaid", 50, 5))
  yrep <- yrep[samp, ]
  
  if (type == "overlaid") {
    ppc_dens_overlay(y, yrep, ...) 
  } else {
    ppc_hist(y, yrep, ...)
  }
}
Run Code Online (Sandbox Code Playgroud)

要使用,pp_check.foo我们只需创建一个包含yyrep组件的列表,并为其指定 foo 类:

x <- list(y = rnorm(200), yrep = matrix(rnorm(1e5), nrow = 500, ncol = 200))
class(x) <- "foo"
#create plot above:
pp_check(x, type = "overlaid")
Run Code Online (Sandbox Code Playgroud)

英拉

#create same model but in inla:
library(INLA)
fit_poisson_inla <- inla(y ~ roach100 + treatment + senior, offset = log(exposure2), data = roaches,
           control.predictor = list(compute = T),
           family = "poisson")
Run Code Online (Sandbox Code Playgroud)

inla_object_name$marginals.fitted.values返回每个 的后验预测分布y

fit_poisson_inla$marginals.fitted.values
#so to get distribution for first oberservation:
fitted.Predictor.1 <- fit_poisson_inla$marginals.fitted.values[[1]]
Run Code Online (Sandbox Code Playgroud)

我认为重复采样会给我带来我需要的东西,但只有 75 个值(dim(fitted.Predictor.1)用于创建此分布的每个观察值,而实际上我希望从整个值范围中采样。我认为我们可以这样做(部分4.3此处)通过使用inla.tmarginal线性预测器:

fitted_dist <- fit_poisson_inla$marginals.linear.predictor
#should i have used "inla.rmarginal(n, marginal)"?
marginal_dist <- lapply(fitted_dist, function(y) inla.tmarginal(function(x) {exp(x)}, y)) %>% map(~ as.data.frame(.) %>%  rename(., xx = x))
#resample 500 times
yrep_poisson_inla <- as.matrix(bind_rows(rerun(500, lapply(marginal_dist, function(x) sample(x$xx, 1)) %>% as.data.frame())))

#convert to class foo for pp_check
x <- list(y = y, yrep = yrep_poisson_inla[1:50, ])
class(x) <- "foo"
p2 <- pp_check(x, type = "overlaid")
#plot
ggarrange(p1, p2, ncol = 1, nrow = 2, labels = c("rstan", "inla sample"))
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

inla我的问题是如何正确地从这个( fit_poisson_inla) 对象的后验预测分布中获取要传递到的矩阵pp_check yrep_poisson产生离散值,同时yrep_poisson_inla产生连续值。rstan抽签的变化比INLA(第二张图)要多得多。我所做的是否正确,这只是一些采样问题还是不同方法的产物?在更复杂的例子中,差异可能很大。

谢谢