通过列名的复杂模式的子集数据帧

Question

通过列名的复杂模式的子集数据帧

我有一个如下所示的数据集:

两轮数据(.t0和.t1)
多尺度(this和that)
每一等级的几个项目(1,22,22a)
几个变量忽略(v2,v3,ignore.t0,ignore.t1,this.t0,this.t1,that.t0,that.t1)

.

dat <- data.frame(id = seq(from=1, to=10, by=1),
                  v2 = rnorm(10),
                  v3 = rnorm(10),
                  ignore.t0 = rnorm(10),
                  this.t0 = rnorm(10),
                  this1.t0 = rnorm(10),
                  this22.t0 = rnorm(10),
                  this22a.t0 = rnorm(10),
                  that.t0 = rnorm(10),
                  that1.t0 = rnorm(10),
                  that22.t0 = rnorm(10),
                  that22a.t0 = rnorm(10),
                  ignore.t1 = rnorm(10),
                  this.t1 = rnorm(10),
                  this1.t1 = rnorm(10),
                  this22.t1 = rnorm(10),
                  this22a.t1 = rnorm(10),
                  that.t1 = rnorm(10),
                  that1.t1 = rnorm(10),
                  that22.t1 = rnorm(10),
                  that22a.t1 = rnorm(10))

Run Code Online (Sandbox Code Playgroud)

我希望将数据框的子集包含在id仅包含列的列中:

比例名称(this或that)AND
句号前面的数字(1.)或数字和字母(22a.)

所以最后,数据框应如下所示:

dat2 <- data.frame(
                   id = seq(from=1, to=10, by=1),
                   #v2 = rnorm(10),
                   #v3 = rnorm(10),
                   #ignore.t0 = rnorm(10),
                   #this.t0 = rnorm(10),
                   this1.t0 = rnorm(10),
                   this22.t0 = rnorm(10),
                   this22a.t0 = rnorm(10),
                   #that.t0 = rnorm(10),
                   that1.t0 = rnorm(10),
                   that22.t0 = rnorm(10),
                   that22a.t0 = rnorm(10),
                   #ignore.t1 = rnorm(10),
                   #this.t1 = rnorm(10),
                   this1.t1 = rnorm(10),
                   this22.t1 = rnorm(10),
                   this22a.t1 = rnorm(10),
                   #that.t1 = rnorm(10),
                   that1.t1 = rnorm(10),
                   that22.t1 = rnorm(10),
                   that22a.t1 = rnorm(10))

Run Code Online (Sandbox Code Playgroud)

数据框比这里表示的要大得多,因此无法输入列索引.它也不可能只认准规模名字,因为this.t0,this.t1,that.t0,并that.t1会被捕捉.

# not quite right
dat2$id <- dat$id
scales <- c("this", "that")
keep.index <- grep(paste(scales,collapse="|"), names(dat))
temp <- dat[keep.index]
dat2 <- cbind(dat2, temp)

Run Code Online (Sandbox Code Playgroud)

如何修改grep模式以在句点之前查找数字OR(数字和字符)？或者是否有更好的方法？

Answer 1

flo*_*del 6

这适用于您的示例:

dat[c("id", grep("(this|that)\\d+[a-z]?\\.", names(dat), value = TRUE))]

Run Code Online (Sandbox Code Playgroud)

哪里:

\\d+ 是一个或多个数字
[a-z]? 是零或一个小写字母
\\. 是为了点

如果要为各种动态构建模式scales,可以执行以下操作:

scales <- c("this", "that")
pattern <- sprintf("(%s)\\d+[a-z]?\\.", paste(scales, collapse = "|"))
dat[c("id", grep(pattern, names(dat), value = TRUE))]

Run Code Online (Sandbox Code Playgroud)

归档时间：	12 年，1 月前
查看次数：	1345 次
最近记录：	12 年，1 月前