我在这里问了类似这样的问题,并且那里提到的解决方案在那里说的问题工作得很好,但是这个问题比较简单,更难.
我有这样的数据表.
ID1 member
1 a parent
2 a child
3 a parent
4 a child
5 a child
6 b parent
7 b parent
8 b child
9 c child
10 c child
11 c parent
12 c child
Run Code Online (Sandbox Code Playgroud)
我想分配一个如下所示的序列,记住ID1和成员列.
ID1 member sequence
1 a parent 1
2 a child 2
3 a parent 1
4 a child 2
5 a child 3
6 b parent 1
7 b parent 1
8 b child 2
9 c child 2 *
10 c child 3
11 c parent 1
12 c child 2
Run Code Online (Sandbox Code Playgroud)
即
> dt$sequence = 1, wherever dt$member == "parent"
> dt$sequence = previous_row_value + 1, wherever dt$member=="child"
Run Code Online (Sandbox Code Playgroud)
但有时可能会发生新的ID1可能无法以member ="parent"开头.如果以"child"开头(例如星号标记的行),我们必须以2开始排序.到目前为止,我一直在使用循环,如下所示.
dt_sequence <- dt[ ,sequencing(.SD), by="ID1"]
sequencing <- function(dt){
for(i in 1:nrow(dt)){
if(i == 1){
if(dt[i,member] %in% "child")
dt$sequence[i] = 2
else
dt$sequence[i] = 1
}
else{
if(dt[i,member] %in% "child")
dt$sequence[i] = as.numeric(dt$sequence[i-1]) + 1
else
dt$sequence[i] = 1
}
}
return(dt)
}
Run Code Online (Sandbox Code Playgroud)
我在4e5行的数据表上运行此代码,需要花费大量时间才能完成(大约20分钟).任何人都可以建议更快的方式来做到这一点.
Rol*_*and 11
DF <- read.table(text=" ID1 member
1 a parent
2 a child
3 a parent
4 a child
5 a child
6 b parent
7 b parent
8 b child
9 c child
10 c child
11 c parent
12 c child", header=TRUE, stringsAsFactors=FALSE)
library(data.table)
setDT(DF)
DF[, sequence := seq_along(member) + (member[1] == "child"),
by = list(ID1, cumsum(member == "parent"))]
# ID1 member sequence
# 1: a parent 1
# 2: a child 2
# 3: a parent 1
# 4: a child 2
# 5: a child 3
# 6: b parent 1
# 7: b parent 1
# 8: b child 2
# 9: c child 2
#10: c child 3
#11: c parent 1
#12: c child 2
Run Code Online (Sandbox Code Playgroud)