Rom*_*rik 10 split r dataframe
data.frame根据分组因子分割a 行相当容易.但是如何按列拆分并可能应用函数?
my.df <- data.frame(a = runif(10),
b = runif(10),
c = runif(10),
d = runif(10))
grp <- as.factor(c(1,1, 2,2))
Run Code Online (Sandbox Code Playgroud)
我想要的是群体的col柱.
到目前为止我所做的是一个穷人的申请.
lapply(as.list(as.numeric(levels(grp))), FUN = function(x, cn, data) {
rowMeans(data[grp %in% x])
}, cn = grp, data = my.df)
Run Code Online (Sandbox Code Playgroud)
编辑 谢谢大家的参与.我运行了10次重复*,我的工作data.frame大约有22000行.这些是几秒钟内的结果.
Roman: 2.19
Joris: 4.60
Joris #2: 3.79 #changed sapply to lapply as suggested by Joris in the [R chatroom][1].
Gavin: 4.70
James & EDi: > 200 # * ran only one replicate due to the large order of magnitude difference
Run Code Online (Sandbox Code Playgroud)
令我感到奇怪的是,手头的任务没有包装函数.也许总有一天我们能够做到
apply(X = my.df, MARGIN = 3, INDEX = my.groups, FUN = mean) # :)
Run Code Online (Sandbox Code Playgroud)
您可以使用相同的逻辑,但以更方便的形式:
sapply(levels(grp),function(x)rowMeans(my.df[which(grp==x)]))
Run Code Online (Sandbox Code Playgroud)
转换my.df为列表并将其拆分,然后在强制转换为数据框后将您的函数应用于列表的每个组件子集:
lapply(split(as.list(my.df), grp), function(x) rowMeans(as.data.frame(x)))
Run Code Online (Sandbox Code Playgroud)
这给出了:
> lapply(split(as.list(my.df), grp), function(x) rowMeans(as.data.frame(x)))
$`1`
[1] 0.8229189 0.4901288 0.2057578 0.6531641 0.3897858 0.4225179
[7] 0.3905410 0.3928784 0.1715857 0.3973192
$`2`
[1] 0.61348623 0.61229702 0.31938521 0.28325342 0.25857158
[6] 0.49071991 0.01179999 0.57639186 0.38407240 0.17467337
Run Code Online (Sandbox Code Playgroud)
这相当于@ Roman的"穷人的申请":
> roman <- lapply(as.list(as.numeric(levels(grp))),
+ FUN = function(x, cn, data) {
+ rowMeans(data[grp %in% x])
+ }, cn = grp, data = my.df)
> gavin <- lapply(split(as.list(my.df), grp),
+ function(x) rowMeans(as.data.frame(x)))
> all.equal(roman, gavin)
[1] "names for current but not for target"
Run Code Online (Sandbox Code Playgroud)
除了组件上的名称.
| 归档时间: |
|
| 查看次数: |
4286 次 |
| 最近记录: |