R data.table:标记连续非NA值的计数

Sha*_*ang 2 label r data.table

这是我的数据表column1.

library(data.table)

dt = data.table(column1 = c(NA, NA, "A", "A", "A", NA, NA, NA, NA, "B", NA, NA, "1 2", "1 2", NA, NA, "A", "A", "A", "A", "A", NA, NA, NA, NA, ...))

> print(dt)
    column1
 1:      NA
 2:      NA
 3:       A
 4:       A
 5:       A
 6:      NA
 7:      NA
 8:      NA
 9:      NA
10:       B
11:      NA
12:      NA
13:     1 2
14:     1 2
15:      NA
16:      NA
17:       A
18:       A
19:       A
20:       A
21:       A
22:      NA
23:      NA
24:      NA
25:      NA
...     ...
Run Code Online (Sandbox Code Playgroud)

column1包括NA值或字符.我想按照该组中的项目数标记每个连续的非NA值组.这是预期的目的dt$labels

> print(dt)
    column1    labels
 1:      NA    0 
 2:      NA    0 
 3:       A    3 
 4:       A    3 
 5:       A    3    
 6:      NA    0 
 7:      NA    0  
 8:      NA    0
 9:      NA    0 
10:       B    1 
11:      NA    0 
12:      NA    0  
13:     1 2    2  
14:     1 2    2  
15:      NA    0 
16:      NA    0 
17:       A    5  
18:       A    5  
19:       A    5  
20:       A    5     
21:       A    5    
22:      NA    0  
23:      NA    0 
24:      NA    0   
25:      NA    0   
...     ...    ...   
Run Code Online (Sandbox Code Playgroud)

有3个连续的A,1"B",2"1 2"和5"A".

使用rle()

x <- rle(dt$column1) 
Run Code Online (Sandbox Code Playgroud)

将给出每个唯一值的长度

 Run Length Encoding                                                                                                                                                                                                                                                                        
   lengths: int [1:18] 1 1 3 1 1 1 1 1 1 1 ...                                                                                                                                                                                                                                              
   values : chr [1:18] NA NA "A" NA NA NA NA "B" NA NA "1 2" ...  
Run Code Online (Sandbox Code Playgroud)

但我不知道如何将这些长度映射到data.table列labels.

akr*_*run 6

我们可以使用rleidfrom data.table创建一个分组变量,然后将逻辑向量乘以,.N并将:=输出赋值给('label')

dt[, labels := .N*!is.na(column1), rleid(is.na(column1))]
dt
#    column1 labels
# 1:      NA      0
# 2:      NA      0
# 3:       A      3
# 4:       A      3
# 5:       A      3
# 6:      NA      0
# 7:      NA      0
# 8:      NA      0
# 9:      NA      0
#10:       B      1
#11:      NA      0
#12:      NA      0
#13:     1 2      2
#14:     1 2      2
#15:      NA      0
#16:      NA      0
#17:       A      5
#18:       A      5
#19:       A      5
#20:       A      5
#21:       A      5
#22:      NA      0
#23:      NA      0
#24:      NA      0
#25:      NA      0
Run Code Online (Sandbox Code Playgroud)

数据

dt <- data.table(column1 = c(NA, NA, "A", "A", "A", NA, NA, NA, NA, "B", 
  NA, NA, "1 2", "1 2", NA, NA, "A", "A", "A", "A", "A", NA, NA, NA, NA))
Run Code Online (Sandbox Code Playgroud)