益智:将数字均匀分布在群组中

Dra*_*mmy 7 sql t-sql sql-server sql-server-2008-r2

这真的是一个难题.它之前可能已被问过,但我找不到任何东西,所以我想我会分享这个问题.

我正在尝试在应用程序中实现某种负载平衡,并将问题简化为我认为应该是一个简单的TSQL练习(该应用程序主要在SQL Server域(SQL Server 2008 R2)中).

基本上我有一个有两个整数的表; 唯一的,顺序的Id和非唯一的值.该表可以包含任意数量的记录,我想生成一个数据表,其中前n个最大值被分成单独的"分组",然后第二组n个最大值被分成单独的"分组".

我在下面有一份初稿,但我相信它可以改进......

DECLARE @GroupCount INT = 5

-- Set up the test data
DECLARE @test TABLE (Id INT IDENTITY(1, 1), Value INT)
INSERT  @Test (Value)
VALUES  (100), (456), (121), (402), (253), (872), (765), (6529), (1029), (342), (98), (1), (0), (4), (46), (23), (456), (416), (2323), (4579)


--Order by Value descending
;WITH cte AS
(
    SELECT  *
            ,ROW_NUMBER() OVER (ORDER BY Value DESC) RowNum
    FROM    @Test
)
--use modulus to split into grouping
, cte2 AS
(
    SELECT  *
            ,ROW_NUMBER() OVER (PARTITION BY RowNum % @GroupCount ORDER BY RowNum DESC) Rnk
    FROM    cte
)
SELECT  ROW_NUMBER() OVER (PARTITION BY Rnk ORDER BY Value DESC) AS 'Grouping'
    ,Value
    ,Id
FROM    cte2
ORDER BY [Grouping], Value ASC
Run Code Online (Sandbox Code Playgroud)

这适用于并生成以下数据集:

Grouping,   Value,      Id
========    =====       ==
1           46          15
1           342         10
1           765         7
1           6529        8
2           23          16
2           253         5
2           456         2
2           4579        20
3           4           14
3           121         3
3           456         17
3           2323        19
4           1           12
4           100         1
4           416         18
4           1029        9
5           0           13
5           98          11
5           402         4
5           872         6
Run Code Online (Sandbox Code Playgroud)

返回的数据集是正确的,因为前n个最大值被拆分为单独的分组,依此类推,但每个分组中的总值在分组1中与分组5(例如)相比完全不同.

分组和SUMmed时,我们可以看到非均匀传播:

Grouping,   SummedValues
========    ============
1           7682
2           5311
3           2904
4           1546
5           1372
Run Code Online (Sandbox Code Playgroud)

在尽可能少的行中,如何更好地平衡值,以便每个分组中的总值更均匀地分布?

Sql*_*Zim 1

这是有缺陷的,但对于给定的示例数据来说并不可怕。你的旅费可能会改变。

declare @groupcount int = 5;
create table t (id int identity(1, 1), value int);
insert  t values 
    (100),(456),(121),(402),(253),(872),(765),(6529),(1029),(342)
  , (98),(1),(0),(4),(46),(23),(456),(416),(2323),(4579);
;with cte as (
  select *
      , rn = row_number() over (order by value asc)
      , pct = value/sum(value+.0) over()
      , target = 1.0 / @groupcount 
  from t
)
, remaining as (
select id, value, rn
  , grp = convert(int,(sum(value) over (order by rn)/sum(value+.0) over())*@groupCount)+1
from cte
)
select
    grp = row_number() over (order by sum(value) desc)
  , sumValue = sum(value)
from remaining
group by grp
Run Code Online (Sandbox Code Playgroud)

rextester 演示:http://rextester.com/UNV61100

结果:

+-----+----------+
| grp | sumValue |
+-----+----------+
|   1 |     6529 |
|   2 |     4579 |
|   3 |     3483 |
|   4 |     2323 |
|   5 |     1901 |
+-----+----------+
Run Code Online (Sandbox Code Playgroud)


Sql Server 2008 兼容版本:

declare @groupcount int = 5;
create table t (id int identity(1, 1), value int);
insert  t values 
    (100),(456),(121),(402),(253),(872),(765),(6529),(1029),(342)
  , (98),(1),(0),(4),(46),(23),(456),(416),(2323),(4579);
;with cte as (
  select *
      , rn = row_number() over (order by value asc)
      , pct = value/tv.TotalValue
      , target = 1.0 / @groupcount 
  from t
    cross join (select TotalValue = sum(value+.0) from t) tv
)
, remaining as (
select id, value, rn
  , grp = convert(int,((x.sumValueOver/TotalValue)*@groupcount)+1)
from cte
  outer apply (
    select sumValueOver = sum(value) 
    from cte i
    where i.rn <= cte.rn
      ) x
)
select
    grp = row_number() over (order by sum(value) desc)
  , sumValue = sum(value)
from remaining
group by grp
Run Code Online (Sandbox Code Playgroud)

rextester 演示:http://rextester.com/DEUDJ77007

返回:

+-----+----------+
| grp | sumValue |
+-----+----------+
|   1 |     6529 |
|   2 |     4579 |
|   3 |     3483 |
|   4 |     2323 |
|   5 |     1901 |
+-----+----------+
Run Code Online (Sandbox Code Playgroud)