Dom*_*ski 4 r smoothing gam mgcv brms
我需要在多个因素上建立一个平滑项的模型。该by论证允许我对每个因子水平进行平滑建模,但我找不到如何在多个因子上做到这一点。
我尝试了类似于以下的解决方案,但没有成功:
data <- iris
data$factor2 <- rep(c("A", "B"), 75)
mgcv::gam(Sepal.Length ~ s(Petal.Length, by = c(Species, factor2)), data = data)
#> Error in model.frame.default(formula = Sepal.Length ~ 1 + Petal.Length + : variable lengths differ (found for 'c(Species, factor2)')
Run Code Online (Sandbox Code Playgroud)
由reprex 包(v2.0.0)创建于 2021-08-05
欢迎任何帮助!
造成的问题之一interaction()是它改变了模型的矩阵,这意味着模型数据中包含的一些变量发生了变化:
m <- mgcv::gam(body_mass_g ~ s(flipper_length_mm, by = interaction(species, sex)), data = palmerpenguins::penguins)
head(insight::get_data(m))
#> body_mass_g flipper_length_mm species sex
#> 1 3750 181 Adelie.male male
#> 2 3800 186 Adelie.female female
#> 3 3250 195 Adelie.female female
#> 5 3450 193 Adelie.female female
#> 6 3650 190 Adelie.male male
#> 7 3625 181 Adelie.female female
Run Code Online (Sandbox Code Playgroud)
由reprex 包(v2.0.1)创建于 2021-08-06
在使用后处理功能(例如可视化)时,这可能会导致一些问题。
然而,根据 Gavin 和 IRTFM 的答案,可以通过在模型中添加变量作为固定效应来轻松解决这个问题。
这是一个演示,还说明了两个单独的平滑和交互之间的差异:
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.0.5
set.seed(1)
# Create data
data <- data.frame(x = rep(seq(-10, 10, length.out = 500), 2),
fac1 = as.factor(rep(c("A", "B", "C"), length.out = 1000)),
fac2 = as.factor(rep(c("X", "Y"), each = 500)))
data$y <- data$x^2 + rnorm(nrow(data), sd = 5)
data$y[data$fac1 == "A"] <- sign(data$x[data$fac1 == "A"]) * data$y[data$fac1 == "A"] + 50
data$y[data$fac1 == "B"] <- datawizard::change_scale(data$y[data$fac1 == "B"]^3, c(-50, 100))
data$y[data$fac2 == "X" & data$fac1 == "C"] <- data$y[data$fac2 == "X" & data$fac1 == "C"] - 100
data$y[data$fac2 == "X" & data$fac1 == "B"] <- datawizard::change_scale(data$y[data$fac2 == "X" & data$fac1 == "B"] ^ 2, c(-50, 100))
data$y[data$fac2 == "X" & data$fac1 == "A"] <- datawizard::change_scale(data$y[data$fac2 == "X" & data$fac1 == "A"] * -3, c(0, 100))
# Real trends
ggplot(data, aes(x = x, y = y, color = fac1, shape = fac2)) +
geom_point()
Run Code Online (Sandbox Code Playgroud)

# Two smooths
m <- mgcv::gam(y ~ fac1 * fac2 + s(x, by = fac1) + s(x, by = fac2), data = data)
plot(modelbased::estimate_relation(m, length = 100, preserve_range = F))
Run Code Online (Sandbox Code Playgroud)

# Interaction
m <- mgcv::gam(y ~ fac1 * fac2 + s(x, by = interaction(fac1, fac2)), data = data)
plot(modelbased::estimate_relation(m, length = 100, preserve_range = F))
Run Code Online (Sandbox Code Playgroud)

由reprex 包(v2.0.1)创建于 2021-08-06
最后一个模型成功地恢复了每个因素组合的趋势。