r中data.table中的组累积地识别新值

Geo*_*sky 3 r dataframe dplyr data.table tidyverse

如何创建一个新列,Letter通过Year+ + 的唯一梳子组累计标识列中的新值外观Month

数据样本.

require(data.table)
dt <- data.table(Letter = c(LETTERS[c(5, 1:2, 1:2, 1:4, 3:6)]),
                 Year = 2018,
                 Month = c(rep(5,5), rep(6,4), rep(7,4)))
Run Code Online (Sandbox Code Playgroud)

打印.

    Letter Year Month
 1:      E 2018     5
 2:      A 2018     5
 3:      B 2018     5
 4:      A 2018     5
 5:      B 2018     5
 6:      A 2018     6
 7:      B 2018     6
 8:      C 2018     6
 9:      D 2018     6
10:      C 2018     7
11:      D 2018     7
12:      E 2018     7
13:      F 2018     7
Run Code Online (Sandbox Code Playgroud)

结果我试图得到:

    Letter Year Month   New
 1:      E 2018     5  TRUE
 2:      A 2018     5  TRUE
 3:      B 2018     5  TRUE
 4:      A 2018     5  TRUE
 5:      B 2018     5  TRUE
 6:      A 2018     6 FALSE
 7:      B 2018     6 FALSE
 8:      C 2018     6  TRUE
 9:      D 2018     6  TRUE
10:      C 2018     7 FALSE
11:      D 2018     7 FALSE
12:      E 2018     7 FALSE
13:      F 2018     7  TRUE
Run Code Online (Sandbox Code Playgroud)

详细问题:

  1. 默认情况下,Group1("E","A","B","A","B")都为TRUE,无法与之比较.
  2. group1中的哪个字母("A","B","C","D")在group1中不重复.
  3. 然后,group3中的哪个字母("C","D","E","F")在第1组和第2组("E","A","B","A","B"中不重复", "A B C D").

Fra*_*ank 5

初始化为FALSE; 然后加入每个字母的第一个年月,并更新为TRUE:

dt[, v := FALSE]
dt[unique(dt, by="Letter"), on=.(Letter, Year, Month), v := TRUE][]

    Letter Year Month     v
 1:      E 2018     5  TRUE
 2:      A 2018     5  TRUE
 3:      B 2018     5  TRUE
 4:      A 2018     5  TRUE
 5:      B 2018     5  TRUE
 6:      A 2018     6 FALSE
 7:      B 2018     6 FALSE
 8:      C 2018     6  TRUE
 9:      D 2018     6  TRUE
10:      C 2018     7 FALSE
11:      D 2018     7 FALSE
12:      E 2018     7 FALSE
13:      F 2018     7  TRUE
Run Code Online (Sandbox Code Playgroud)