使用特定字符串对列名称进行子集化

Question

使用特定字符串对列名称进行子集化

我正在尝试根据以特定字符串开头的列名称对数据框进行子集化。我有一些像 ABC_1 ABC_2 ABC_3 的列，还有一些像 ABC_XYZ_1、ABC_XYZ_2、ABC_XYZ_3 的列

如何对数据框进行子集化，使其仅包含 ABC_1、ABC_2、ABC_3 ...ABC_n 列，而不包含 ABC_XYZ_1、ABC_XYZ_2...？

我已经尝试过这个选项

set.seed(1)
df <- data.frame( ABC_1 = sample(0:1,3,repl = TRUE),
            ABC_2 = sample(0:1,3,repl = TRUE),
            ABC_XYZ_1 = sample(0:1,3,repl = TRUE),
            ABC_XYZ_2 = sample(0:1,3,repl = TRUE) )


df1 <- df[ , grepl( "ABC" , names( df ) ) ]

ind <- apply( df1 , 1 , function(x) any( x > 0 ) )

df1[ ind , ]

Run Code Online (Sandbox Code Playgroud)

但这给了我带有 ABC_1...ABC_n ...和 ABC_XYZ_1...ABC_XYZ_n... 的列名称...我对 ABC_XYZ_1 列不感兴趣，只对带有 ABC_1 的列感兴趣，....非常感谢任何建议。

Answer 1

Jot*_*ota 6

要指定“ABC_”后跟一位或多位数字（即\\d+或[0-9]+），您可以使用

df1 <- df[ , grepl("ABC_\\d+", names( df ), perl = TRUE ) ]
# df1 <- df[ , grepl("ABC_[0-9]+", names( df ), perl = TRUE ) ] # another option

Run Code Online (Sandbox Code Playgroud)

要强制列名称以“ABC_”开头，您可以添加^到正则表达式，以仅当“ABC_\d+”出现在字符串开头而不是出现在字符串中的任何位置时进行匹配。

df1 <- df[ , grepl("^ABC_\\d+", names( df ), perl = TRUE ) ]

Run Code Online (Sandbox Code Playgroud)

如果dplyr更符合您的喜好，您可以尝试

library(dplyr)
select(df, matches("^ABC_\\d+"))

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，2 月前
查看次数：	3182 次
最近记录：	5 年，6 月前