我是R的新手,目前正尝试根据预定义的排除标准对数据进行子集分析。我目前正在尝试删除ICD-10编码的所有患有痴呆症的病例。问题是,有多个变量包含有关每个人的疾病状况的信息(约70个变量),尽管由于它们以相同的方式编码,因此可以对所有变量应用相同的条件。
一些模拟数据:
#Create dataframe containing simulated data
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'))
#data is structured as below:
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1003 G560 G20 NA
4 1004 D235 NA I802
5 1005 B178 NA NA
6 1006 F011 A049 A481
7 1007 F023 NA NA
8 1008 C761 NA NA
9 1009 H653 G300 NA
10 …Run Code Online (Sandbox Code Playgroud) 我目前正在尝试运行一个循环,对多个自变量 (n = 6) 和多个因变量 (n=1000) 执行线性回归。
这是一些示例数据,年龄、性别和教育程度代表我感兴趣的自变量,testscore_* 是我的因变量。
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005, 1006,1007, 1008, 1009, 1010, 1011),
age = as.numeric(c('56', '43','59','74','61','62','69','80','40','55','58')),
sex = as.numeric(c('0','1','0','0','1','1','0','1','0','1','0')),
testscore_1 = as.numeric(c('23','28','30','15','7','18','29','27','14','22','24')),
testscore_2 = as.numeric(c('1','3','2','5','8','2','5','6','7','8','2')),
testscore_3 = as.numeric(c('18','20','19','15','20','23','19','25','10','14','12')),
education = as.numeric(c('5','4','3','5','2', '1','4','4','3','5','2')))
Run Code Online (Sandbox Code Playgroud)
我有工作代码,允许我为多个 DV 运行回归模型(我确信更有经验的 R 用户会因为缺乏效率而不喜欢它):
y <- as.matrix(df[4:6])
#model for age
lm_results <- lm(y ~ age, data = df)
write.csv((broom::tidy(lm_results)), "lm_results_age.csv")
regression_results <-broom::tidy(lm_results)
standardized_coefficients <- lm.beta(lm_results)
age_standardize_results <- coef(standardized_coefficients)
write.csv(age_standardize_results, "lm_results_age_standardized_coefficients.csv")
Run Code Online (Sandbox Code Playgroud)
age然后我会通过手动替换为sexand来重复这一切 education
有没有人有更优雅的方式来运行这个 …
我目前正在尝试根据选定变量的子集排除异常值,目的是执行敏感性分析。我已经调整了此处可用的函数:计算 R 中的异常值),但到目前为止尚未成功(我仍然是 R 新手用户)。如果您有任何建议,请告诉我!
df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011),
measure1 = rnorm(11, mean = 8, sd = 4),
measure2 = rnorm(11, mean = 40, sd = 5),
measure3 = rnorm(11, mean = 20, sd = 2),
measure4 = rnorm(11, mean = 9, sd = 3))
vars_of_interest <- c("measure1", "measure3", "measure4")
# define a function to remove outliers
FindOutliers <- function(data) {
lowerq = quantile(data)[2]
upperq = quantile(data)[4] …Run Code Online (Sandbox Code Playgroud)