我有一个如下所示的数据集:
.t0和.t1)this和that)1,22,22a)v2,v3,ignore.t0,ignore.t1,this.t0,this.t1,that.t0,that.t1).
dat <- data.frame(id = seq(from=1, to=10, by=1),
v2 = rnorm(10),
v3 = rnorm(10),
ignore.t0 = rnorm(10),
this.t0 = rnorm(10),
this1.t0 = rnorm(10),
this22.t0 = rnorm(10),
this22a.t0 = rnorm(10),
that.t0 = rnorm(10),
that1.t0 = rnorm(10),
that22.t0 = rnorm(10),
that22a.t0 = rnorm(10),
ignore.t1 = rnorm(10),
this.t1 = rnorm(10),
this1.t1 = rnorm(10),
this22.t1 = rnorm(10),
this22a.t1 = rnorm(10),
that.t1 = rnorm(10),
that1.t1 = rnorm(10),
that22.t1 = rnorm(10),
that22a.t1 = rnorm(10))
Run Code Online (Sandbox Code Playgroud)
我希望将数据框的子集包含在id仅包含列的列中:
this或that)AND1.)或数字和字母(22a.)所以最后,数据框应如下所示:
dat2 <- data.frame(
id = seq(from=1, to=10, by=1),
#v2 = rnorm(10),
#v3 = rnorm(10),
#ignore.t0 = rnorm(10),
#this.t0 = rnorm(10),
this1.t0 = rnorm(10),
this22.t0 = rnorm(10),
this22a.t0 = rnorm(10),
#that.t0 = rnorm(10),
that1.t0 = rnorm(10),
that22.t0 = rnorm(10),
that22a.t0 = rnorm(10),
#ignore.t1 = rnorm(10),
#this.t1 = rnorm(10),
this1.t1 = rnorm(10),
this22.t1 = rnorm(10),
this22a.t1 = rnorm(10),
#that.t1 = rnorm(10),
that1.t1 = rnorm(10),
that22.t1 = rnorm(10),
that22a.t1 = rnorm(10))
Run Code Online (Sandbox Code Playgroud)
数据框比这里表示的要大得多,因此无法输入列索引.它也不可能只认准规模名字,因为this.t0,this.t1,that.t0,并that.t1会被捕捉.
# not quite right
dat2$id <- dat$id
scales <- c("this", "that")
keep.index <- grep(paste(scales,collapse="|"), names(dat))
temp <- dat[keep.index]
dat2 <- cbind(dat2, temp)
Run Code Online (Sandbox Code Playgroud)
如何修改grep模式以在句点之前查找数字OR(数字和字符)?或者是否有更好的方法?
这适用于您的示例:
dat[c("id", grep("(this|that)\\d+[a-z]?\\.", names(dat), value = TRUE))]
Run Code Online (Sandbox Code Playgroud)
哪里:
\\d+ 是一个或多个数字[a-z]? 是零或一个小写字母\\. 是为了点如果要为各种动态构建模式scales,可以执行以下操作:
scales <- c("this", "that")
pattern <- sprintf("(%s)\\d+[a-z]?\\.", paste(scales, collapse = "|"))
dat[c("id", grep(pattern, names(dat), value = TRUE))]
Run Code Online (Sandbox Code Playgroud)