使用regexp选择R dataframe中的行

Eri*_* C. 24 regex r dataframe

我正在尝试在数据框中选择行,其中列中包含的字符串与正则表达式或子字符串匹配:

数据帧:

aName   bName   pName   call  alleles   logRatio    strength
AX-11086564 F08_ADN103  2011-02-10_R10  AB  CG  0.363371    10.184215
AX-11086564 A01_CD1919  2011-02-24_R11  BB  GG  -1.352707   9.54909
AX-11086564 B05_CD2920  2011-01-27_R6   AB  CG  -0.183802   9.766334
AX-11086564 D04_CD5950  2011-02-09_R9   AB  CG  0.162586    10.165051
AX-11086564 D07_CD6025  2011-02-10_R10  AB  CG  -0.397097   9.940238
AX-11086564 B05_CD3630  2011-02-02_R7   AA  CC  2.349906    9.153076
AX-11086564 D04_ADN103  2011-02-10_R2   BB  GG  -1.898088   9.872966
AX-11086564 A01_CD2588  2011-01-27_R5   BB  GG  -1.208094   9.239801
Run Code Online (Sandbox Code Playgroud)

例如,我想要一个只包含ADN在列中包含的行的数据帧bName.其次,我想包含所有行ADN的列bName和匹配2011-02-10_R2pName.

我尝试使用功能grep(),agrep()但更多但没有成功......

42-*_*42- 30

subset(dat, grepl("ADN", bName)  &  pName == "2011-02-10_R2" )
Run Code Online (Sandbox Code Playgroud)

注意"&"(而不是"&&",它没有矢量化)和"=="(而不是"=",这是赋值).

请注意,您可以使用:

 dat[ with(dat,  grepl("ADN", bName)  &  pName == "2011-02-10_R2" ) , ]
Run Code Online (Sandbox Code Playgroud)

...但是,在函数内部使用时可能更好,但是,对于dat $ pName为NA的任何行,它将返回NA值.可以通过添加& !is.na(dat$pName)逻辑表达式来消除该缺陷(某些视为特征).


And*_*rie 8

干得好.

首先重新创建数据:

dat <- read.table(text="
aName   bName   pName   call  alleles   logRatio    strength
AX-11086564 F08_ADN103  2011-02-10_R10  AB  CG  0.363371    10.184215
AX-11086564 A01_CD1919  2011-02-24_R11  BB  GG  -1.352707   9.54909
AX-11086564 B05_CD2920  2011-01-27_R6   AB  CG  -0.183802   9.766334
AX-11086564 D04_CD5950  2011-02-09_R9   AB  CG  0.162586    10.165051
AX-11086564 D07_CD6025  2011-02-10_R10  AB  CG  -0.397097   9.940238
AX-11086564 B05_CD3630  2011-02-02_R7   AA  CC  2.349906    9.153076
AX-11086564 D04_ADN103  2011-02-10_R2   BB  GG  -1.898088   9.872966
AX-11086564 A01_CD2588  2011-01-27_R5   BB  GG  -1.208094   9.239801
", header=TRUE)
Run Code Online (Sandbox Code Playgroud)

接下来,用于grepl构造匹配的逻辑索引:

index1 <- with(dat, grepl("ADN", bName))
index2 <- with(dat, grepl("2011-02-10_R2", pName))
Run Code Online (Sandbox Code Playgroud)

现在使用&运算符的子集:

dat[index1 & index2, ]
        aName      bName         pName call alleles  logRatio strength
7 AX-11086564 D04_ADN103 2011-02-10_R2   BB      GG -1.898088 9.872966
Run Code Online (Sandbox Code Playgroud)