M_O*_*ord 1 loops r linear-regression lapply purrr
我目前正在尝试运行一个循环,对多个自变量 (n = 6) 和多个因变量 (n=1000) 执行线性回归。
这是一些示例数据,年龄、性别和教育程度代表我感兴趣的自变量,testscore_* 是我的因变量。
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005, 1006,1007, 1008, 1009, 1010, 1011),
age = as.numeric(c('56', '43','59','74','61','62','69','80','40','55','58')),
sex = as.numeric(c('0','1','0','0','1','1','0','1','0','1','0')),
testscore_1 = as.numeric(c('23','28','30','15','7','18','29','27','14','22','24')),
testscore_2 = as.numeric(c('1','3','2','5','8','2','5','6','7','8','2')),
testscore_3 = as.numeric(c('18','20','19','15','20','23','19','25','10','14','12')),
education = as.numeric(c('5','4','3','5','2', '1','4','4','3','5','2')))
Run Code Online (Sandbox Code Playgroud)
我有工作代码,允许我为多个 DV 运行回归模型(我确信更有经验的 R 用户会因为缺乏效率而不喜欢它):
y <- as.matrix(df[4:6])
#model for age
lm_results <- lm(y ~ age, data = df)
write.csv((broom::tidy(lm_results)), "lm_results_age.csv")
regression_results <-broom::tidy(lm_results)
standardized_coefficients <- lm.beta(lm_results)
age_standardize_results <- coef(standardized_coefficients)
write.csv(age_standardize_results, "lm_results_age_standardized_coefficients.csv")
Run Code Online (Sandbox Code Playgroud)
age然后我会通过手动替换为sexand来重复这一切 education
有没有人有更优雅的方式来运行这个 - 例如,通过循环所有感兴趣的 IV(即年龄、性别和教育)?
如果有人建议一种broom::tidy(lm_results)与 中的标准化系数相结合的快速方法lm.beta::lm.beta,即将标准化回归系数与主模型输出相结合,我也将非常感激。
这是对我过去必须使用的类似工作流程的改编。请记住,要因运行大量模型而真正惩罚自己。我在您的数据框中添加了几个预测列。祝你好运!!
解决方案:
# Creating pedictor and outcome vectors
ivs_vec <- names(df)[c(2:6, 10)]
dvs_vec <- names(df)[7:9]
# Creating formulas and running the models
ivs <- paste0(" ~ ", ivs_vec)
dvs_ivs <- unlist(lapply(ivs, function(x) paste0(dvs_vec, x)))
formulas <- lapply(dvs_ivs, formula)
lm_results <- lapply(formulas, function(x) {
lm(x, data = df)
})
# Creating / combining results
tidy_results <- lapply(lm_results, broom::tidy)
dv_list <- lapply(as.list(stringi::stri_extract_first_words(dvs_ivs)), rep, 2)
tidy_results <- Map(cbind, dv_list, tidy_results)
standardized_results <- lapply(lm_results, function(x) coef(lm.beta::lm.beta(x)))
combined_results <- Map(cbind, tidy_results, standardized_results)
# Cleaning up final results
names(combined_results) <- dvs_ivs
combined_results <- lapply(combined_results, function(x) {row.names(x) <- c(NULL); x})
new_names <- c("Outcome", "Term", "Estimate", "Std. Error", "Statistic", "P-value", "Standardized Estimate")
combined_results <- lapply(combined_results, setNames, new_names)
Run Code Online (Sandbox Code Playgroud)
结果:
combined_results[1:5]
$`testscore_1 ~ age`
Outcome Term Estimate Std. Error Statistic P-value
Standardized Estimate
1 testscore_1 (Intercept) 18.06027731 12.3493569 1.4624468 0.1776424 0.00000000
2 testscore_1 age 0.05835152 0.2031295 0.2872627 0.7804155 0.09531823
$`testscore_2 ~ age`
Outcome Term Estimate Std. Error Statistic P-value Standardized Estimate
1 testscore_2 (Intercept) 3.63788676 4.39014570 0.8286483 0.4287311 0.0000000
2 testscore_2 age 0.01367313 0.07221171 0.1893478 0.8540216 0.0629906
$`testscore_3 ~ age`
Outcome Term Estimate Std. Error Statistic P-value Standardized Estimate
1 testscore_3 (Intercept) 6.1215175 6.698083 0.9139208 0.3845886 0.0000000
2 testscore_3 age 0.1943125 0.110174 1.7636870 0.1116119 0.5068026
$`testscore_1 ~ sex`
Outcome Term Estimate Std. Error Statistic P-value Standardized Estimate
1 testscore_1 (Intercept) 22.5 3.099283 7.2597435 4.766069e-05 0.0000000
2 testscore_1 sex -2.1 4.596980 -0.4568217 6.586248e-01 -0.1505386
$`testscore_2 ~ sex`
Outcome Term Estimate Std. Error Statistic P-value Standardized Estimate
1 testscore_2 (Intercept) 3.666667 1.041129 3.521816 0.006496884 0.0000000
2 testscore_2 sex 1.733333 1.544245 1.122447 0.290723029 0.3504247
Run Code Online (Sandbox Code Playgroud)
数据:
df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005, 1006,1007, 1008, 1009, 1010, 1011),
age = as.numeric(c('56', '43','59','74','61','62','69','80','40','55','58')),
sex = as.numeric(c('0','1','0','0','1','1','0','1','0','1','0')),
pred1 = sample(1:11, 11),
pred2 = sample(1:11, 11),
pred3 = sample(1:11, 11),
testscore_1 = as.numeric(c('23','28','30','15','7','18','29','27','14','22','24')),
testscore_2 = as.numeric(c('1','3','2','5','8','2','5','6','7','8','2')),
testscore_3 = as.numeric(c('18','20','19','15','20','23','19','25','10','14','12')),
education = as.numeric(c('5','4','3','5','2', '1','4','4','3','5','2')))
Run Code Online (Sandbox Code Playgroud)