使用SAS和R对记录进行排序和输出

Mak*_*oto 11 r sas

我有以下数据集

PatientName BVAID   Rank    TreatmentCode   TreatmentID DoseID  
Tim Stuart  BVA-027 3   OP_TBC            1             1  
Tim Stuart  BVA-041 4   OP_TBC            1             1  
Tim Stuart  BVA-021 7   OP_TBC            1             1  
Tim Stuart  BVA-048 10  OP_TBC            1             1  
Tim Stuart  BVA-020 14  OP_TBC            1             1  
Tim Stuart  BVA-024 15  OP_TBC            1             1  
Tim Stuart  BVA-001 16  OP_TBC            1             1  
Tim Stuart  BVA-013 27  OP_TBC            1             1  
Tim Stuart  BVA-018 28  OP_TBC            1             1  
Tim Stuart  BVA-051 29  OP_TBC            1             1  
Tim Stuart  BVA-027 3   OP_TC             2             1  
Tim Stuart  BVA-041 4   OP_TC             2             1  
Tim Stuart  BVA-048 10  OP_TC             2             1  
Tim Stuart  BVA-020 14  OP_TC             2             1    
Tim Stuart  BVA-001 16  OP_TC             2             1  
Tim Stuart  BVA-002 17  OP_TC             2             1    
Tim Stuart  BVA-019 18  OP_TC             2             1  
Tim Stuart  BVA-044 22  OP_TC             2             1  
Tim Stuart  BVA-025 23  OP_TC             2             1  
Tim Stuart  BVA-016 26  OP_TC             2             1  
Tim Stuart  BVA-013 27  OP_TC             2             1  
Tim Stuart  BVA-001 16  OP_SICO           3             1  
Tim Stuart  BVA-002 17  OP_SICO           3             1  
Tim Stuart  BVA-013 27  OP_SICO           3             1  
Run Code Online (Sandbox Code Playgroud)

我需要输出rank每组中最小的记录,TreatmentID但如果记录是在前一TreatmentID组中输出的,我需要选择下一个最小的rank并输出TreamtmentID组的记录- 我每组只需要一条记录TreatmentID.这需要是一个可以自动化的可扩展解决方案.我的输出文件将只有树唯一记录,即每个组一个记录,每个记录都是唯一的,BVAID并且在该组中具有最小的排名.

PatientName BVAID   Rank    TreatmentCode   TreatmentID DoseID  
Tim Stuart  BVA-027 3   OP_TBC            1             1  
Tim Stuart  BVA-041 4   OP_TC             2             1  
Tim Stuart  BVA-001 16  OP_SICO           3             1  
Run Code Online (Sandbox Code Playgroud)

哪个程序可以处理这个更好的SAS或R.

Mat*_*wle 13

紧凑,可扩展和可读的R解决方案:

require(data.table)
DT = as.data.table(dat)   # dat input from Brian's answer
r = 0
DT[,{r<<-min(Rank[Rank>r]); .SD[Rank==r]}, by=TreatmentID]

     TreatmentID PatientName   BVAID Rank TreatmentCode DoseID
[1,]           1  Tim Stuart BVA-027    3        OP_TBC      1
[2,]           2  Tim Stuart BVA-041    4         OP_TC      1
[3,]           3  Tim Stuart BVA-001   16       OP_SICO      1
Run Code Online (Sandbox Code Playgroud)


Jos*_*ien 5

这是一个R解决方案.我真的很想知道是否有一种比这更紧凑的方法.

library(plyr)

df <- df[order(df$PatientName, df$TreatmentID),]

ddply(df, .(PatientName), function(DF) {
    # For each Treatment, find the value of Rank to be kept    
    splitRanks <- split(DF$Rank, DF$TreatmentID)
    minRanks <- Reduce(f = function(X, Y) min(Y[Y>min(X)]), 
                       x = splitRanks[-1], 
                       init = min(splitRanks[[1]]), accumulate = TRUE)
    # For each Treatment, extract row w/ Rank determined by the calculation above
    splitDF <- split(DF, DF$TreatmentID)
    rows <- mapply(FUN = function(X, Y) X[X$Rank==Y,], splitDF, minRanks, 
                   SIMPLIFY = FALSE)
    # Bind the extracted rows back together in a data frame
    do.call("rbind", rows)
})

#   PatientName   BVAID Rank TreatmentCode TreatmentID DoseID
# 1  Tim Stuart BVA-027    3        OP_TBC           1      1
# 2  Tim Stuart BVA-041    4         OP_TC           2      1
# 3  Tim Stuart BVA-001   16       OP_SICO           3      1
Run Code Online (Sandbox Code Playgroud)


Rob*_*dge 5

我的SAS解决方案.所有步骤均可扩展:

data test;
  input  PatientName $ 1-10
         BVAID $ 
         Rank    
         TreatmentCode $  
         TreatmentID 
         DoseID 
        ;
datalines;
Tim Stuart  BVA-027 3   OP_TBC            1             1  
Tim Stuart  BVA-041 4   OP_TBC            1             1  
Tim Stuart  BVA-021 7   OP_TBC            1             1  
Tim Stuart  BVA-048 10  OP_TBC            1             1  
Tim Stuart  BVA-020 14  OP_TBC            1             1  
Tim Stuart  BVA-024 15  OP_TBC            1             1  
Tim Stuart  BVA-001 16  OP_TBC            1             1  
Tim Stuart  BVA-013 27  OP_TBC            1             1  
Tim Stuart  BVA-018 28  OP_TBC            1             1  
Tim Stuart  BVA-051 29  OP_TBC            1             1  
Tim Stuart  BVA-027 3   OP_TC             2             1  
Tim Stuart  BVA-041 4   OP_TC             2             1  
Tim Stuart  BVA-048 10  OP_TC             2             1  
Tim Stuart  BVA-020 14  OP_TC             2             1    
Tim Stuart  BVA-001 16  OP_TC             2             1  
Tim Stuart  BVA-002 17  OP_TC             2             1    
Tim Stuart  BVA-019 18  OP_TC             2             1  
Tim Stuart  BVA-044 22  OP_TC             2             1  
Tim Stuart  BVA-025 23  OP_TC             2             1  
Tim Stuart  BVA-016 26  OP_TC             2             1  
Tim Stuart  BVA-013 27  OP_TC             2             1  
Tim Stuart  BVA-001 16  OP_SICO           3             1  
Tim Stuart  BVA-002 17  OP_SICO           3             1  
Tim Stuart  BVA-013 27  OP_SICO           3             1  
;
run;

proc sort data=test;
  by treatmentid;
run;

data test2;
  set test;
  by treatmentid;
  retain smallest;

  **
  ** CREATE AN EMPTY HASH TABLE THAT WE CAN STORE A LIST OF 
  ** RANKS IN THAT HAVE ALREADY BEEN USED. DONE THIS WAY FOR
  ** SCALABILITY.
  *;
  if _n_ eq 1 then do;
    declare hash ht();
    ht.definekey ('rank');
    ht.definedone();
  end;

  if first.treatmentid then do;
    smallest = .;
  end;

  **
  ** IF THE CURRENT RANK HAS NOT ALREADY BEEN USED THEN 
  ** EVALUATE IT TO SEE IF ITS THE SMALLEST VALUE.
  *;
  if ht.find() ne 0 then do;
    smallest = min(smallest,rank);
  end;

  **
  ** SAVE THE SMALLEST UNUSED RANK BACK TO THE RANK VALUE.
  ** THEN ADD IT TO THE HASH TABLE AND FINALLY OUTPUT THE
  ** OBSERVATION.
  *;
  if last.treatmentid then do;
    rank = smallest;
    ht.add();    
    output;
  end;

  drop smallest;
run;
Run Code Online (Sandbox Code Playgroud)

SAS赢了吗?JK!;-)