我想group_by %>% do(tidy(*))用来运行几个线性回归模型,并将模型结果提取到数据框.每个模型的数据框应包括以下内容:结果变量,暴露变量,样本大小,β系数,SE和p值.
library(tidyverse)
data("mtcars")
outcomes <- c("wt, mpg", "hp", "disp")
exposures <- c("gear", "vs", "am")
covariates <- c("drat", "qsec")
Run Code Online (Sandbox Code Playgroud)
模型应该针对所有协变量调整每次曝光的每个结果,例如
lm(wt ~ factor(gear)+drat+qsec, mtcars, na.action = na.omit)
lm(wt ~ factor(vs)+drat+qsec, mtcars, na.action = na.omit)
etc...
Run Code Online (Sandbox Code Playgroud)
最终的代码可能看起来像这样?
models <- (mtcars %>%
gather(x_var, x_value, -c(y_var, y_i, cv1:cv3)) %>%
group_by(y_var, x_var) %>%
do(broom::tidy(lm(y_i ~ x_value + cv1 + cv2 + cv3, data = .))))
Run Code Online (Sandbox Code Playgroud) 如何在模拟数据帧中随机向某些列或每列添加缺失值(例如每列随机缺失约 5%),另外,是否有更有效的方法来模拟具有连续列和因子列的数据帧?
#Simulate some data
N <- 2000
data <- data.frame(id = 1:2000,age = rnorm(N,18:90),bmi = rnorm(N,15:40),
chol = rnorm(N,50:350), insulin = rnorm(N,2:40),sbp = rnorm(N, 50:200),
dbp = rnorm(N, 30:150), sex = c(rep(1, 1000), rep(2, 1000)),
smoke = rep(c(1, 2), 1000), educ = sample(LETTERS[1:4]))
#Manually add some missing values
data <- data %>%
mutate(age = "is.na<-"(age, age <19 | age >88),
bmi = "is.na<-"(bmi, bmi >38 | bmi <16),
insulin = "is.na<-"(insulin, insulin >38),
educ = "is.na<-"(educ, bmi >35))
Run Code Online (Sandbox Code Playgroud)