使用列表基于公共值从data.frame中提取行

Question

使用列表基于公共值从data.frame中提取行

我正在寻找一种基于数字序列列表从data.frame过滤行的简单方法.

这是一个例子:

我的初始数据框:

data <- data.frame(x=c(0,1,2,0,1,2,3,4,5,12,2,0,10,11,12,13),y="other_data")

Run Code Online (Sandbox Code Playgroud)

我的列表:

list1 <- list(1:5,10:13)

Run Code Online (Sandbox Code Playgroud)

我的目标是只保留"数据"中的行,这些行包含与"data"的"x"列完全相同的"list1"数字序列.所以输出data.frame应该是:

finaldata <- data.frame(x=c(1:5,10:13),y="other_data")

Run Code Online (Sandbox Code Playgroud)

这样做的任何想法？

Answer 1

Her*_*oka 2

我从一个自定义函数开始获取一个序列的子集，然后很容易使用 lapply 进行扩展。

#function that takes sequence and a vector
#and returns indices of vector that have complete sequence
get_row_indices<- function(sequence,v){
  #get run lengths of whether vector is in sequence
  rle_d <- rle(v %in% sequence)
  #test if it's complete, so both v in sequence and length of 
  #matches is length of sequence
  select <- rep(length(sequence)==rle_d$lengths &rle_d$values,rle_d$lengths)

  return(select)

}


#add row ID to data to show selection
data$row_id <- 1:nrow(data)
res <- do.call(rbind,lapply(list1,function(x){
  return(data[get_row_indices(sequence=x,v=data$x),])
}))

res

> res
    x          y row_id
5   1 other_data      5
6   2 other_data      6
7   3 other_data      7
8   4 other_data      8
9   5 other_data      9
13 10 other_data     13
14 11 other_data     14
15 12 other_data     15
16 13 other_data     16

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，7 月前
查看次数：	171 次
最近记录：	10 年，7 月前