Dan*_*ein 9 r median dataframe
在这里,我创建一个新列,以指示myData是高于还是低于其中位数
### MedianSplits based on Whole Data
#create some test data
myDataFrame=data.frame(myData=runif(15),myFactor=rep(c("A","B","C"),5))
#create column showing median split
myBreaks= quantile(myDataFrame$myData,c(0,.5,1))
myDataFrame$MedianSplitWholeData = cut(
myDataFrame$myData,
breaks=myBreaks,
include.lowest=TRUE,
labels=c("Below","Above"))
#Check if it's correct
myDataFrame$AboveWholeMedian = myDataFrame$myData > median(myDataFrame$myData)
myDataFrame
Run Code Online (Sandbox Code Playgroud)
工作良好.现在我想做同样的事情,但计算myFactor每个级别的中位数分割.
我想出来了:
#Median splits within factor levels
byOutput=by(myDataFrame$myData,myDataFrame$myFactor, function (x) {
myBreaks= quantile(x,c(0,.5,1))
MedianSplitByGroup=cut(x,
breaks=myBreaks,
include.lowest=TRUE,
labels=c("Below","Above"))
MedianSplitByGroup
})
Run Code Online (Sandbox Code Playgroud)
byOutput包含我想要的东西.它正确地对因子A,B和C的每个元素进行分类.但是我想创建一个新列myDataFrame $ FactorLevelMedianSplit,它显示新计算的中值分割.
如何将"by"命令的输出转换为有用的数据框列?
我想也许"by"命令不是R-like方式来做这个...
更新:
有了Thierry如何巧妙地使用factor()的例子,并且在Spector的书中发现了"ave"函数,我发现了这个解决方案,它不需要额外的包.
myDataFrame$MediansByFactor=ave(
myDataFrame$myData,
myDataFrame$myFactor,
FUN=median)
myDataFrame$FactorLevelMedianSplit = factor(
myDataFrame$myData>myDataFrame$MediansByFactor,
levels = c(TRUE, FALSE),
labels = c("Above", "Below"))
Run Code Online (Sandbox Code Playgroud)
这是使用 plyr 包的解决方案。
myDataFrame <- data.frame(myData=runif(15),myFactor=rep(c("A","B","C"),5))
library(plyr)
ddply(myDataFrame, "myFactor", function(x){
x$Median <- median(x$myData)
x$FactorLevelMedianSplit <- factor(x$myData <= x$Median, levels = c(TRUE, FALSE), labels = c("Below", "Above"))
x
})
Run Code Online (Sandbox Code Playgroud)