M_O*_*ord 9 r subset filter dataframe
我是R的新手,目前正尝试根据预定义的排除标准对数据进行子集分析。我目前正在尝试删除ICD-10编码的所有患有痴呆症的病例。问题是,有多个变量包含有关每个人的疾病状况的信息(约70个变量),尽管由于它们以相同的方式编码,因此可以对所有变量应用相同的条件。
一些模拟数据:
#Create dataframe containing simulated data
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,1006,1007,1008,1009,1010,1011),
disease_code_1 = c('I802','H356','G560','D235','B178','F011','F023','C761','H653','A049','J679'),
disease_code_2 = c('A071','NA','G20','NA','NA','A049','NA','NA','G300','G308','A045'),
disease_code_3 = c('H250','NA','NA','I802','NA','A481','NA','NA','NA','NA','D352'))
#data is structured as below:
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1003 G560 G20 NA
4 1004 D235 NA I802
5 1005 B178 NA NA
6 1006 F011 A049 A481
7 1007 F023 NA NA
8 1008 C761 NA NA
9 1009 H653 G300 NA
10 1010 A049 G308 NA
11 1011 J679 A045 D352
Run Code Online (Sandbox Code Playgroud)
在这里,我试图删除所有“ disease_code”变量中带有“痴呆症代码”的病例。
#Remove cases with dementia from dataframe (e.g. F023, G20)
Newdata_df <- subset(df, (2:4 != "F023"|"G20"|"F009"|"F002"|"F001"|"F000"|"F00"|
"G309"| "G308"|"G301"|"G300"|"G30"| "F01"|"F018"|"F013"|
"F012"| "F011"| "F010"|"F01"))
Run Code Online (Sandbox Code Playgroud)
我收到的错误是:
Error in 2:4 != "F023" | "G20" :
operations are possible only for numeric, logical or complex types
Run Code Online (Sandbox Code Playgroud)
理想情况下,子集数据帧应如下所示:
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
4 1004 D235 NA I802
5 1005 B178 NA NA
8 1008 C761 NA NA
11 1011 J679 A045 D352
Run Code Online (Sandbox Code Playgroud)
我知道我的代码中有一个错误,尽管我不确定如何正确地解决它。尽管到目前为止还没有运气,但我尝试了其他几种方法(使用dplyr)。
任何帮助是极大的赞赏!
一种dplyr
可能性可能是:
df %>%
filter_at(vars(2:4), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
"G309", "G308","G301","G300","G30", "F01","F018","F013",
"F012", "F011", "F010","F01")))
ID disease_code_1 disease_code_2 disease_code_3
1 1001 I802 A071 H250
2 1002 H356 NA NA
3 1004 D235 NA I802
4 1005 B178 NA NA
5 1008 C761 NA NA
6 1011 J679 A045 D352
Run Code Online (Sandbox Code Playgroud)
在本例中,它检查 2:4 列中是否包含任何给定代码。
或者:
df %>%
filter_at(vars(contains("disease_code")), all_vars(! . %in% c("F023","G20","F009","F002","F001","F000","F00",
"G309", "G308","G301","G300","G30", "F01","F018","F013",
"F012", "F011", "F010","F01")))
Run Code Online (Sandbox Code Playgroud)
在这种情况下,它检查任何具有名称的列是否disease_code
包含任何给定的代码。
归档时间: |
|
查看次数: |
379 次 |
最近记录: |