根据平均值选择不同的行组

Question

根据平均值选择不同的行组

Fra*_*s P 8 algorithm sql-server-2008 linq-to-sql c#-4.0

我有t_stats列id (INT)和列的表ratio (DECIMAL(8,4)). id是独特的.

我想查询表t_stats,以便选择具有相同AVG(ratio)(最接近的)的3个组.

可以使用临时表来完成,只要我可以将其作为脚本或存储过程运行.

编辑:这是具体的例子:

INPUT:

id    ratio
--    -----
24  0.930000
25  0.390000
26  0.800000
27  0.920000
28  0.550000
30  0.810000
31  0.770000
32  0.800000
33  0.590000
36  0.760000
37  0.910000
40  0.690000
43  0.390000
45  0.310000
46  0.760000
47  0.710000
54  0.710000
55  0.950000
57  0.920000
60  0.890000
62  0.700000
66  0.890000
68  0.950000
107 0.760000
559 0.990000
560 0.540000
565 0.430000
566 0.830000
568 0.590000
579 0.970000
599 0.900000
623 0.450000
749 0.800000
750 0.970000
753 0.820000
754 0.730000
766 0.620000
768 0.430000
770 0.790000
838 0.700000
875 0.835000
987 0.900000
988 0.740000
1157    0.850000
1250    0.630000
1328    0.860000
2171    0.900000
2176    0.520000
2177    0.980000
2178    0.940000
2180    0.970000
2184    0.990000
2187    0.950000
2188    0.940000
2189    0.920000
2195    0.990000
2233    0.900000
2234    0.940000
2235    0.950000
2240    0.980000
2243    0.920000
2253    0.900000
2266    0.530000
2269    0.920000
2270    0.970000
2271    0.750000
2272    0.820000
2275    0.910000
2277    0.930000
2281    0.690000
2282    0.710000
2288    0.840000
2528    0.870000
2778    0.950000
2814    0.990000

Run Code Online (Sandbox Code Playgroud)

OUTPUT:

groupId    id     ratio
-------    --     -----
1       24      0.930000
1       25      0.390000
1       27      0.920000
1       30      0.810000
1       32      0.800000
1       36      0.760000
1       54      0.710000
1       60      0.890000
1       559     0.990000
1       560     0.540000
1       566     0.830000
1       568     0.590000
1       623     0.450000
1       750     0.970000
1       838     0.700000
1       987     0.900000
1       1157        0.850000
1       2178        0.940000
1       2180        0.970000
1       2253        0.900000
1       2269        0.920000
1       2271        0.750000
1       2281        0.690000
1       2778        0.950000
1       2814        0.990000
2       26      0.800000
2       28      0.550000
2       31      0.770000
2       40      0.690000
2       45      0.310000
2       55      0.950000
2       57      0.920000
2       66      0.890000
2       107     0.760000
2       565     0.430000
2       579     0.970000
2       753     0.820000
2       754     0.730000
2       766     0.620000
2       875     0.835000
2       1328        0.860000
2       2176        0.520000
2       2177        0.980000
2       2184        0.990000
2       2187        0.950000
2       2189        0.920000
2       2233        0.900000
2       2234        0.940000
2       2275        0.910000
2       2282        0.710000
3       33      0.590000
3       37      0.910000
3       43      0.390000
3       46      0.760000
3       47      0.710000
3       62      0.700000
3       68      0.950000
3       599     0.900000
3       749     0.800000
3       768     0.430000
3       770     0.790000
3       988     0.740000
3       1250        0.630000
3       2171        0.900000
3       2188        0.940000
3       2195        0.990000
3       2235        0.950000
3       2240        0.980000
3       2243        0.920000
3       2266        0.530000
3       2270        0.970000
3       2272        0.820000
3       2277        0.930000
3       2288        0.840000
3       2528        0.870000

Run Code Online (Sandbox Code Playgroud)

因此,我想制作3组n值,并针对特定的平均值x.(例如,n=30并且0.75 < x < 0.85看起来像每组30个值的3组,每个组具有,0.75 < AVG(ratio) < 0.85并且id只能属于1组.)

所以每组的平均值几乎相同,接近x:

groupId     avg(ratio)
-------     ----------
1           0.805600
2           0.789000
3           0.797600

Run Code Online (Sandbox Code Playgroud)

Answer 1

Tim*_*ner 3

这是一个 T-SQL 过程版本，有点像草稿，只是草稿顺序每轮都会根据需要进行优化。

如果要挑选所有项目，这种“竞争”性质似乎会导致略低于完美的比率，但好处是您基本上有一个 O(N^2) 算法，因为它本质上是一个 min 函数循环（考虑到这些条款，也许这是乐观的group by）。它也是确定性的，并且如果需要的话，在另一层中实现应该相当简单。

-- SET THESE!
declare @numberOfGroups int = 3
declare @itemsPerGroup int = 25
declare @targetRatio decimal(8,4) = .8
-- /SET

set nocount on

-- Create a table of items
declare @t_stats table (
      id int not null primary key
    , ratio decimal(8,4) not null
    , grp int null
)
insert into @t_stats (id, ratio) values
    (24,0.930000), (25,0.390000), (26,0.800000),
    (27,0.920000), (28,0.550000), (30,0.810000),
    (31,0.770000), (32,0.800000), (33,0.590000),
    (36,0.760000), (37,0.910000), (40,0.690000),
    (43,0.390000), (45,0.310000), (46,0.760000),
    (47,0.710000), (54,0.710000), (55,0.950000),
    (57,0.920000), (60,0.890000), (62,0.700000),
    (66,0.890000), (68,0.950000), (107,0.760000),
    (559,0.990000), (560,0.540000), (565,0.430000),
    (566,0.830000), (568,0.590000), (579,0.970000),
    (599,0.900000), (623,0.450000), (749,0.800000),
    (750,0.970000), (753,0.820000), (754,0.730000),
    (766,0.620000), (768,0.430000), (770,0.790000),
    (838,0.700000), (875,0.835000), (987,0.900000),
    (988,0.740000), (1157,0.850000), (1250,0.630000),
    (1328,0.860000), (2171,0.900000), (2176,0.520000),
    (2177,0.980000), (2178,0.940000), (2180,0.970000),
    (2184,0.990000), (2187,0.950000), (2188,0.940000),
    (2189,0.920000), (2195,0.990000), (2233,0.900000),
    (2234,0.940000), (2235,0.950000), (2240,0.980000),
    (2243,0.920000), (2253,0.900000), (2266,0.530000),
    (2269,0.920000), (2270,0.970000), (2271,0.750000),
    (2272,0.820000), (2275,0.910000), (2277,0.930000),
    (2281,0.690000), (2282,0.710000), (2288,0.840000),
    (2528,0.870000), (2778,0.950000), (2814,0.990000)

-- Create a table of groups
declare @groups table (
    grp int not null primary key identity
)
while (select isnull(max(grp), 0) from @groups) < @numberOfGroups begin
    insert into @groups default values
end

-- Check that we have enough items to fill all groups
if @numberOfGroups * @itemsPerGroup <= (select count(*) from @t_stats) begin

    -- Groups now pick the best-fitting items one at a time
    while (select count(*) from @t_stats where grp is not null) < (select count(*) * @itemsPerGroup from @groups) begin
        declare @grp int, @Num int, @ratio decimal(8,4), @id int

        -- Find the group with the least number of items or the worst ratio
        select top 1 @grp = grp, @Num = Num, @ratio = ratio
        from (
            select g.grp
                , count(i.grp) as Num
                , isnull(avg(i.ratio), 0.0) as ratio
                , abs(@targetRatio - avg(i.ratio)) as RatioDist
            from @groups g
                left join @t_stats i on g.grp = i.grp
            group by g.grp
        ) as a
        order by Num, RatioDist, grp

        -- Let that group make their best pick
        select top 1 @id = id
        from (
            select id
                , abs(((ratio + (@ratio * @Num)) / (@Num + 1)) - @targetRatio) as NewRatioDist
            from @t_stats
            where grp is null
        ) as a
        order by NewRatioDist

        -- Update the items table based upon the pick
        update @t_stats set grp = @grp where id = @id

    end

end
else begin
    -- Not enought items
    raiserror('Too many groups or items per group.', 17, 0)
end

-- Display the results
select grp, count(*) as Num, avg(ratio) as ratio
from @t_stats
group by grp
order by grp

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，4 月前
查看次数：	294 次
最近记录：	13 年，4 月前