在 R 中创建相等总和的组

Question

在 R 中创建相等总和的组

我试图将我的 data.frame/data.table 的一列分成三组，所有组的总和相等。

数据首先从最小到最大排序，这样第一组将由大量具有小值的行组成，而第三组将由少量具有大值的行组成。这是在精神上完成的：

test <- data.frame(x = as.numeric(1:100000))
store <- 0
total <- sum(test$x)

for(i in 1:100000){

  store <- store + test$x[i]

  if(store < total/3){

    test$y[i] <- 1

  } else {

      if(store < 2*total/3){

        test$y[i] <- 2

      } else { 

        test$y[i] <- 3

      }     
  }    
}

Run Code Online (Sandbox Code Playgroud)

虽然成功，但我觉得一定有更好的方法（也许是我缺少的一个非常明显的解决方案）。

我从不喜欢使用循环，尤其是嵌套 ifs，当矢量化方法可用时 - 即使有 100,000 多条记录，这段代码也会变得很慢
这种方法将变得不可能复杂到编码到更多的组（不一定是循环，而是 ifs）
需要预先订购色谱柱。可能无法绕过这个。

作为一个细微差别（并不是说它有区别），但要求和的数据并不总是（或永远）是连续的整数。

Answer 1

ber*_*ant 6

也许与cumsum：

test$z <- cumsum(test$x) %/% (ceiling(sum(test$x) / 3)) + 1

Run Code Online (Sandbox Code Playgroud)

Answer 2

Sam*_*rke 5

这或多或少是一个装箱问题。

使用包中的binPack函数BBmisc：

library(BBmisc)
test$bins <- binPack(test$x, sum(test$x)/3+1)

Run Code Online (Sandbox Code Playgroud)

3 个 bin 的总和几乎相同：

tapply(test$x, test$bins, sum)


    1          2          3 
1666683334 1666683334 1666683332

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，8 月前
查看次数：	1602 次
最近记录：	5 年，12 月前