根据索引列表从数据框列中提取

use*_*212 2 r subset dataframe

我有一个数据框:

df = read.table(text="ID    location    C1  C2  C3  C4  C5  C6
M01 1   A   H   H   A   A   B
M02 2   A   H   A   A   A   B
M03 3   A   B   A   A   A   B
M04 4   H   B   H   A   A   B
M05 5   H   B   H   A   A   B
M06 6   A   B   H   A   A   H
M07 7   A   B   H   B   A   H
M08 8   A   B   H   A   A   H
M09 9   A   B   H   A   A   H
M10 10  B   B   H   A   A   H
M11 11  A   B   H   A   A   H
M12 12  A   B   H   A   A   H
M13 13  A   B   H   A   A   H
M14 14  B   B   B   A   A   H
M15 15  B   B   B   A   A   A", header=T, stringsAsFactors=F)
Run Code Online (Sandbox Code Playgroud)

我想df$ID基于索引行号列表提取出值df.清单a是:

a = list(C1 = c(3,   5,   9,   10,  13), C2 = c(2) , 
C3 = c(1,   3,   13 ), C4 =c(6,   7 ), C6 = c(5,   14 ))
Run Code Online (Sandbox Code Playgroud)

预期的结果是:

$C1
[1] "M03" "M05" "M09" "M10" "M13"

$C2
[1] "M02"

$C3
[1] "M01" "M03" "M13"

$C4
[1] "M06" "M07"

$C6
[1] "M05" "M14"
Run Code Online (Sandbox Code Playgroud)

Ric*_*ven 7

您可以取消a列表,索引数据值,然后relist将其自身作为骨架.

relist(df$ID[unlist(a)], a)
# $C1
# [1] "M03" "M05" "M09" "M10" "M13"
#
# $C2
# [1] "M02"
#
# $C3
# [1] "M01" "M03" "M13"
#
# $C4
# [1] "M06" "M07"
#
# $C6
# [1] "M05" "M14"
Run Code Online (Sandbox Code Playgroud)

此外,如果我们删除名称,我们可以提高速度unlist.

relist(df$ID[unlist(a, use.names = FALSE)], a)
Run Code Online (Sandbox Code Playgroud)

注意:

另一个答案的基准是误导性的.这是一个更准确的基准测试,显示来自另一个答案的实际代码,该代码$在每次迭代时使用提取并删除{}表达式周围不必要的括号...

df <- data.frame(v1 = paste0("M", 1:1e6))
set.seed(24)
a1 <- lapply(1:1e4, function(i) sample(1:1e6, sample(1e3), replace=FALSE))

system.time(relist(df$v1[unlist(a1, use.names = FALSE)], a1))
#   user  system elapsed 
#  0.485   0.004   0.489 
system.time(lapply(a1, function(x) df$v1[x]))
#   user  system elapsed 
#   0.39    0.00    0.39 
Run Code Online (Sandbox Code Playgroud)