跟踪字段变化的函数

Question

跟踪字段变化的函数

Rez*_*eza 6 r sas change-tracking sequence hierarchical-data

我需要一个函数（使用基本 SAS 或 RStudio），使我能够确定某个日期的 ID 号和开始日期的原始（根）ID 号。数据集包括旧 ID、新 ID 和 ID 更改日期。示例数据：

旧ID	新ID	改变日期
1	2	1/1/10
10	11	1/1/10
2	3	7/1/10
3	4	7/10/10
11	12	8/1/10

我需要知道截至 10 年 7 月 15 日的 ID 号和原始（根）ID（截至 10 年 1 月 1 日）。输出应如下所示：

原始ID	最后ID
1	4
10	11

然后，我需要一个标志来帮助我计算在给定时间间隔（在本例中为 1/1/10 到 7/15/10）内发生变化的 OrigID 的数量。我也需要对 7/15/10 之后的多个日期进行类似的计数。

基本 SAS 或 RStudio 中是否有可以执行此操作的函数？

SAS/RI 中研究的功能（分层记录器、同步跟踪、序列跟踪功能）似乎不会起作用（例如，记录器、伐木工人、log4r、验证、futile.logger）

Answer 1

Moo*_*per 3

这应该可行，我只是懒得输入正确的日期。

注意：这假设数据按更改发生排序。

数据

df <- data.frame(
  OldID = c(1, 10, 2, 3, 11), NewID = c(2, 11, 3, 4, 12), ChangeDate = c(1, 1, 2, 2, 3))
df
#>   OldID NewID ChangeDate
#> 1     1     2          1
#> 2    10    11          1
#> 3     2     3          2
#> 4     3     4          2
#> 5    11    12          3

Run Code Online (Sandbox Code Playgroud)

功能

process <- function(df, from, to) {
  process0 <- function(df, i = 1){
    # fetch new value
    new <- df$NewID[i]
    # check in old column
    j <- match(new, df$OldID)
    
    if(is.na(j)) {
      # if not matched, set i to next row
      i <- i + 1
    } else {
      # else we update current row with new "new" value
      df$NewID[i] <- df$NewID[j]
      # and increment the changes
      df$Changes[i] <- df$Changes[i] + 1
      # and remove obsolete row
      df <- df[-j,]
    }
    # do it all over again except if there is no next row
    if(i <= nrow(df)) process0(df, i) else df
  }
  # filter data frame
  df <- subset(df, ChangeDate >= from & ChangeDate <= to, select = c("OldID", "NewID"))
  # start with 1 change per line
  df$Changes <- 1
  # run recursive function
  process0(df)
}

Run Code Online (Sandbox Code Playgroud)

结果

process(df, 1, 2)
#>   OldID NewID Changes
#> 1     1     4       3
#> 2    10    11       1

Run Code Online (Sandbox Code Playgroud)

^{由reprex 包(v0.3.0)于 2021-06-09 创建}

归档时间：	4 年，4 月前
查看次数：	185 次
最近记录：	4 年，3 月前