我有以下数据集
PatientName BVAID Rank TreatmentCode TreatmentID DoseID
Tim Stuart BVA-027 3 OP_TBC 1 1
Tim Stuart BVA-041 4 OP_TBC 1 1
Tim Stuart BVA-021 7 OP_TBC 1 1
Tim Stuart BVA-048 10 OP_TBC 1 1
Tim Stuart BVA-020 14 OP_TBC 1 1
Tim Stuart BVA-024 15 OP_TBC 1 1
Tim Stuart BVA-001 16 OP_TBC 1 1
Tim Stuart BVA-013 27 OP_TBC 1 1
Tim Stuart BVA-018 28 OP_TBC 1 1
Tim Stuart BVA-051 29 OP_TBC 1 1
Tim Stuart BVA-027 3 OP_TC 2 1
Tim Stuart BVA-041 4 OP_TC 2 1
Tim Stuart BVA-048 10 OP_TC 2 1
Tim Stuart BVA-020 14 OP_TC 2 1
Tim Stuart BVA-001 16 OP_TC 2 1
Tim Stuart BVA-002 17 OP_TC 2 1
Tim Stuart BVA-019 18 OP_TC 2 1
Tim Stuart BVA-044 22 OP_TC 2 1
Tim Stuart BVA-025 23 OP_TC 2 1
Tim Stuart BVA-016 26 OP_TC 2 1
Tim Stuart BVA-013 27 OP_TC 2 1
Tim Stuart BVA-001 16 OP_SICO 3 1
Tim Stuart BVA-002 17 OP_SICO 3 1
Tim Stuart BVA-013 27 OP_SICO 3 1
Run Code Online (Sandbox Code Playgroud)
我需要输出rank每组中最小的记录,TreatmentID但如果记录是在前一TreatmentID组中输出的,我需要选择下一个最小的rank并输出TreamtmentID组的记录- 我每组只需要一条记录TreatmentID.这需要是一个可以自动化的可扩展解决方案.我的输出文件将只有树唯一记录,即每个组一个记录,每个记录都是唯一的,BVAID并且在该组中具有最小的排名.
PatientName BVAID Rank TreatmentCode TreatmentID DoseID
Tim Stuart BVA-027 3 OP_TBC 1 1
Tim Stuart BVA-041 4 OP_TC 2 1
Tim Stuart BVA-001 16 OP_SICO 3 1
Run Code Online (Sandbox Code Playgroud)
哪个程序可以处理这个更好的SAS或R.
Mat*_*wle 13
紧凑,可扩展和可读的R解决方案:
require(data.table)
DT = as.data.table(dat) # dat input from Brian's answer
r = 0
DT[,{r<<-min(Rank[Rank>r]); .SD[Rank==r]}, by=TreatmentID]
TreatmentID PatientName BVAID Rank TreatmentCode DoseID
[1,] 1 Tim Stuart BVA-027 3 OP_TBC 1
[2,] 2 Tim Stuart BVA-041 4 OP_TC 1
[3,] 3 Tim Stuart BVA-001 16 OP_SICO 1
Run Code Online (Sandbox Code Playgroud)
这是一个R解决方案.我真的很想知道是否有一种比这更紧凑的方法.
library(plyr)
df <- df[order(df$PatientName, df$TreatmentID),]
ddply(df, .(PatientName), function(DF) {
# For each Treatment, find the value of Rank to be kept
splitRanks <- split(DF$Rank, DF$TreatmentID)
minRanks <- Reduce(f = function(X, Y) min(Y[Y>min(X)]),
x = splitRanks[-1],
init = min(splitRanks[[1]]), accumulate = TRUE)
# For each Treatment, extract row w/ Rank determined by the calculation above
splitDF <- split(DF, DF$TreatmentID)
rows <- mapply(FUN = function(X, Y) X[X$Rank==Y,], splitDF, minRanks,
SIMPLIFY = FALSE)
# Bind the extracted rows back together in a data frame
do.call("rbind", rows)
})
# PatientName BVAID Rank TreatmentCode TreatmentID DoseID
# 1 Tim Stuart BVA-027 3 OP_TBC 1 1
# 2 Tim Stuart BVA-041 4 OP_TC 2 1
# 3 Tim Stuart BVA-001 16 OP_SICO 3 1
Run Code Online (Sandbox Code Playgroud)
我的SAS解决方案.所有步骤均可扩展:
data test;
input PatientName $ 1-10
BVAID $
Rank
TreatmentCode $
TreatmentID
DoseID
;
datalines;
Tim Stuart BVA-027 3 OP_TBC 1 1
Tim Stuart BVA-041 4 OP_TBC 1 1
Tim Stuart BVA-021 7 OP_TBC 1 1
Tim Stuart BVA-048 10 OP_TBC 1 1
Tim Stuart BVA-020 14 OP_TBC 1 1
Tim Stuart BVA-024 15 OP_TBC 1 1
Tim Stuart BVA-001 16 OP_TBC 1 1
Tim Stuart BVA-013 27 OP_TBC 1 1
Tim Stuart BVA-018 28 OP_TBC 1 1
Tim Stuart BVA-051 29 OP_TBC 1 1
Tim Stuart BVA-027 3 OP_TC 2 1
Tim Stuart BVA-041 4 OP_TC 2 1
Tim Stuart BVA-048 10 OP_TC 2 1
Tim Stuart BVA-020 14 OP_TC 2 1
Tim Stuart BVA-001 16 OP_TC 2 1
Tim Stuart BVA-002 17 OP_TC 2 1
Tim Stuart BVA-019 18 OP_TC 2 1
Tim Stuart BVA-044 22 OP_TC 2 1
Tim Stuart BVA-025 23 OP_TC 2 1
Tim Stuart BVA-016 26 OP_TC 2 1
Tim Stuart BVA-013 27 OP_TC 2 1
Tim Stuart BVA-001 16 OP_SICO 3 1
Tim Stuart BVA-002 17 OP_SICO 3 1
Tim Stuart BVA-013 27 OP_SICO 3 1
;
run;
proc sort data=test;
by treatmentid;
run;
data test2;
set test;
by treatmentid;
retain smallest;
**
** CREATE AN EMPTY HASH TABLE THAT WE CAN STORE A LIST OF
** RANKS IN THAT HAVE ALREADY BEEN USED. DONE THIS WAY FOR
** SCALABILITY.
*;
if _n_ eq 1 then do;
declare hash ht();
ht.definekey ('rank');
ht.definedone();
end;
if first.treatmentid then do;
smallest = .;
end;
**
** IF THE CURRENT RANK HAS NOT ALREADY BEEN USED THEN
** EVALUATE IT TO SEE IF ITS THE SMALLEST VALUE.
*;
if ht.find() ne 0 then do;
smallest = min(smallest,rank);
end;
**
** SAVE THE SMALLEST UNUSED RANK BACK TO THE RANK VALUE.
** THEN ADD IT TO THE HASH TABLE AND FINALLY OUTPUT THE
** OBSERVATION.
*;
if last.treatmentid then do;
rank = smallest;
ht.add();
output;
end;
drop smallest;
run;
Run Code Online (Sandbox Code Playgroud)
SAS赢了吗?JK!;-)
| 归档时间: |
|
| 查看次数: |
794 次 |
| 最近记录: |