假设我有以下数据框:
Base Coupled Derived Decl
1 0 0 1
1 7 0 1
1 1 0 1
2 3 12 1
1 0 4 1
Run Code Online (Sandbox Code Playgroud)
这是dput输出:
temp <- structure(list(Base = c(1L, 1L, 1L, 2L, 1L), Coupled = c(0L,7L, 1L, 3L, 0L), Derived = c(0L, 0L, 0L, 12L, 4L), Decl = c(1L, 1L, 1L, 1L, 1L)), .Names = c("Base", "Coupled", "Derived", "Decl"), row.names = c(NA, 5L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
我想计算每列的中位数.然后,对于每一行,我想计算大于其各自列的中位数的单元格值的数量,并将其作为名为AboveMedians的列附加.
在这个例子中,中位数就是c(1,1,0,1).我想要的结果表是
Base Coupled Derived Decl AboveMedians
1 0 0 1 0
1 7 0 1 1
1 1 0 1 0
2 3 12 1 3
1 0 4 1 1
Run Code Online (Sandbox Code Playgroud)
优雅的R方式是什么?我有一些涉及for循环和sapply的东西,但这似乎不是最佳的.
谢谢.
我们可以使用rowMedians从matrixStats转换后data.frame到matrix.
library(matrixStats)
Medians <- colMedians(as.matrix(temp))
Medians
#[1] 1 1 0 1
Run Code Online (Sandbox Code Playgroud)
然后,复制'中位数'以使尺寸等于'temp'的尺寸,进行比较并得到rowSums逻辑矩阵.
temp$AboveMedians <- rowSums(temp >Medians[col(temp)])
temp$AboveMedians
#[1] 0 1 0 3 1
Run Code Online (Sandbox Code Playgroud)
或者base R唯一的选择是
apply(temp, 2, median)
# Base Coupled Derived Decl
# 1 1 0 1
rowSums(sweep(temp, 2, apply(temp, 2, median), FUN = ">"))
Run Code Online (Sandbox Code Playgroud)