Eri*_* C. 24 regex r dataframe
我正在尝试在数据框中选择行,其中列中包含的字符串与正则表达式或子字符串匹配:
数据帧:
aName bName pName call alleles logRatio strength
AX-11086564 F08_ADN103 2011-02-10_R10 AB CG 0.363371 10.184215
AX-11086564 A01_CD1919 2011-02-24_R11 BB GG -1.352707 9.54909
AX-11086564 B05_CD2920 2011-01-27_R6 AB CG -0.183802 9.766334
AX-11086564 D04_CD5950 2011-02-09_R9 AB CG 0.162586 10.165051
AX-11086564 D07_CD6025 2011-02-10_R10 AB CG -0.397097 9.940238
AX-11086564 B05_CD3630 2011-02-02_R7 AA CC 2.349906 9.153076
AX-11086564 D04_ADN103 2011-02-10_R2 BB GG -1.898088 9.872966
AX-11086564 A01_CD2588 2011-01-27_R5 BB GG -1.208094 9.239801
Run Code Online (Sandbox Code Playgroud)
例如,我想要一个只包含ADN
在列中包含的行的数据帧bName
.其次,我想包含所有行ADN
的列bName
和匹配2011-02-10_R2
列pName
.
我尝试使用功能grep()
,agrep()
但更多但没有成功......
42-*_*42- 30
subset(dat, grepl("ADN", bName) & pName == "2011-02-10_R2" )
Run Code Online (Sandbox Code Playgroud)
注意"&"(而不是"&&",它没有矢量化)和"=="(而不是"=",这是赋值).
请注意,您可以使用:
dat[ with(dat, grepl("ADN", bName) & pName == "2011-02-10_R2" ) , ]
Run Code Online (Sandbox Code Playgroud)
...但是,在函数内部使用时可能更好,但是,对于dat $ pName为NA的任何行,它将返回NA值.可以通过添加& !is.na(dat$pName)
逻辑表达式来消除该缺陷(某些视为特征).
干得好.
首先重新创建数据:
dat <- read.table(text="
aName bName pName call alleles logRatio strength
AX-11086564 F08_ADN103 2011-02-10_R10 AB CG 0.363371 10.184215
AX-11086564 A01_CD1919 2011-02-24_R11 BB GG -1.352707 9.54909
AX-11086564 B05_CD2920 2011-01-27_R6 AB CG -0.183802 9.766334
AX-11086564 D04_CD5950 2011-02-09_R9 AB CG 0.162586 10.165051
AX-11086564 D07_CD6025 2011-02-10_R10 AB CG -0.397097 9.940238
AX-11086564 B05_CD3630 2011-02-02_R7 AA CC 2.349906 9.153076
AX-11086564 D04_ADN103 2011-02-10_R2 BB GG -1.898088 9.872966
AX-11086564 A01_CD2588 2011-01-27_R5 BB GG -1.208094 9.239801
", header=TRUE)
Run Code Online (Sandbox Code Playgroud)
接下来,用于grepl
构造匹配的逻辑索引:
index1 <- with(dat, grepl("ADN", bName))
index2 <- with(dat, grepl("2011-02-10_R2", pName))
Run Code Online (Sandbox Code Playgroud)
现在使用&
运算符的子集:
dat[index1 & index2, ]
aName bName pName call alleles logRatio strength
7 AX-11086564 D04_ADN103 2011-02-10_R2 BB GG -1.898088 9.872966
Run Code Online (Sandbox Code Playgroud)