Sha*_*ang 2 label r data.table
这是我的数据表column1.
library(data.table)
dt = data.table(column1 = c(NA, NA, "A", "A", "A", NA, NA, NA, NA, "B", NA, NA, "1 2", "1 2", NA, NA, "A", "A", "A", "A", "A", NA, NA, NA, NA, ...))
> print(dt)
column1
1: NA
2: NA
3: A
4: A
5: A
6: NA
7: NA
8: NA
9: NA
10: B
11: NA
12: NA
13: 1 2
14: 1 2
15: NA
16: NA
17: A
18: A
19: A
20: A
21: A
22: NA
23: NA
24: NA
25: NA
... ...
Run Code Online (Sandbox Code Playgroud)
值column1包括NA值或字符.我想按照该组中的项目数标记每个连续的非NA值组.这是预期的目的dt$labels
> print(dt)
column1 labels
1: NA 0
2: NA 0
3: A 3
4: A 3
5: A 3
6: NA 0
7: NA 0
8: NA 0
9: NA 0
10: B 1
11: NA 0
12: NA 0
13: 1 2 2
14: 1 2 2
15: NA 0
16: NA 0
17: A 5
18: A 5
19: A 5
20: A 5
21: A 5
22: NA 0
23: NA 0
24: NA 0
25: NA 0
... ... ...
Run Code Online (Sandbox Code Playgroud)
有3个连续的A,1"B",2"1 2"和5"A".
使用rle()与
x <- rle(dt$column1)
Run Code Online (Sandbox Code Playgroud)
将给出每个唯一值的长度
Run Length Encoding
lengths: int [1:18] 1 1 3 1 1 1 1 1 1 1 ...
values : chr [1:18] NA NA "A" NA NA NA NA "B" NA NA "1 2" ...
Run Code Online (Sandbox Code Playgroud)
但我不知道如何将这些长度映射到data.table列labels.
我们可以使用rleidfrom data.table创建一个分组变量,然后将逻辑向量乘以,.N并将:=输出赋值给('label')
dt[, labels := .N*!is.na(column1), rleid(is.na(column1))]
dt
# column1 labels
# 1: NA 0
# 2: NA 0
# 3: A 3
# 4: A 3
# 5: A 3
# 6: NA 0
# 7: NA 0
# 8: NA 0
# 9: NA 0
#10: B 1
#11: NA 0
#12: NA 0
#13: 1 2 2
#14: 1 2 2
#15: NA 0
#16: NA 0
#17: A 5
#18: A 5
#19: A 5
#20: A 5
#21: A 5
#22: NA 0
#23: NA 0
#24: NA 0
#25: NA 0
Run Code Online (Sandbox Code Playgroud)
dt <- data.table(column1 = c(NA, NA, "A", "A", "A", NA, NA, NA, NA, "B",
NA, NA, "1 2", "1 2", NA, NA, "A", "A", "A", "A", "A", NA, NA, NA, NA))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
257 次 |
| 最近记录: |