use*_*212 2 r subset dataframe
我有一个数据框:
df = read.table(text="ID location C1 C2 C3 C4 C5 C6
M01 1 A H H A A B
M02 2 A H A A A B
M03 3 A B A A A B
M04 4 H B H A A B
M05 5 H B H A A B
M06 6 A B H A A H
M07 7 A B H B A H
M08 8 A B H A A H
M09 9 A B H A A H
M10 10 B B H A A H
M11 11 A B H A A H
M12 12 A B H A A H
M13 13 A B H A A H
M14 14 B B B A A H
M15 15 B B B A A A", header=T, stringsAsFactors=F)
Run Code Online (Sandbox Code Playgroud)
我想df$ID基于索引行号列表提取出值df.清单a是:
a = list(C1 = c(3, 5, 9, 10, 13), C2 = c(2) ,
C3 = c(1, 3, 13 ), C4 =c(6, 7 ), C6 = c(5, 14 ))
Run Code Online (Sandbox Code Playgroud)
预期的结果是:
$C1
[1] "M03" "M05" "M09" "M10" "M13"
$C2
[1] "M02"
$C3
[1] "M01" "M03" "M13"
$C4
[1] "M06" "M07"
$C6
[1] "M05" "M14"
Run Code Online (Sandbox Code Playgroud)
您可以取消a列表,索引数据值,然后relist将其自身作为骨架.
relist(df$ID[unlist(a)], a)
# $C1
# [1] "M03" "M05" "M09" "M10" "M13"
#
# $C2
# [1] "M02"
#
# $C3
# [1] "M01" "M03" "M13"
#
# $C4
# [1] "M06" "M07"
#
# $C6
# [1] "M05" "M14"
Run Code Online (Sandbox Code Playgroud)
此外,如果我们删除名称,我们可以提高速度unlist.
relist(df$ID[unlist(a, use.names = FALSE)], a)
Run Code Online (Sandbox Code Playgroud)
注意:
另一个答案的基准是误导性的.这是一个更准确的基准测试,显示来自另一个答案的实际代码,该代码$在每次迭代时使用提取并删除{}表达式周围不必要的括号...
df <- data.frame(v1 = paste0("M", 1:1e6))
set.seed(24)
a1 <- lapply(1:1e4, function(i) sample(1:1e6, sample(1e3), replace=FALSE))
system.time(relist(df$v1[unlist(a1, use.names = FALSE)], a1))
# user system elapsed
# 0.485 0.004 0.489
system.time(lapply(a1, function(x) df$v1[x]))
# user system elapsed
# 0.39 0.00 0.39
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
437 次 |
| 最近记录: |