更智能的 ntile

Man*_*ngo 6 postgresql sql-server window-functions rank

使用ntile()窗口函数时,主要问题是它随意分组为大致相等的部分,而不管实际值如何。

例如使用以下查询:

select
    id,title,price,
    row_number() over(order by price) as row_number,
    rank() over(order by price) as rank,
    count(*) over(order by price) as count,
    dense_rank() over(order by price) as dense_rank,
    ntile(10) over(order by price) as decile
from paintings
order by price;
Run Code Online (Sandbox Code Playgroud)

我会得到 10 组大小大致相同的画作,很有可能以相同的价格结束在不同的箱子里。

例如:

select
    id,title,price,
    row_number() over(order by price) as row_number,
    rank() over(order by price) as rank,
    count(*) over(order by price) as count,
    dense_rank() over(order by price) as dense_rank,
    ntile(10) over(order by price) as decile
from paintings
order by price;
Run Code Online (Sandbox Code Playgroud)

请注意,有四个带有 price 的项目12,但其中两个在十分位数 1 中,其中两个在十分位数 2 中。我想将这些项目放在一起,并且我不会为哪个十分位数大惊小怪。

我已经包含了其他窗口函数来进行比较。

似乎ntile()使用row_number()only 并以此为基础进行截止。如果它使用rank()count(*)功能,因为具有相同价格的物品最终会出现在同一个 bin 中,这将更加公平。

这是 PostgreSQL 和 SQL Server 的行为,大概也是其余的行为。

问题是,有没有办法实现这一目标?

Mik*_*son 5

您可以使用rank()每个 bin 的行数进行整数除法。

declare @T table(id int, title varchar(100), price int);

insert into @T(id, title, price) values
(19, 'Deux fillettes, fond jaune et rouge        ', 11),
(17, 'Flowers in a Pitcher                       ', 12),
(5 , 'Composition with Red, Yellow and Blue      ', 12),
(18, 'La lecon de musique (The Music Lesson)     ', 12),
(9 , 'The Adoration of the Magi                  ', 12),
(29, 'Self-Portrait                              ', 14),
(25, 'Symphony in White, No. 1: The White Girl   ', 14),
(30, 'The Anatomy Lecture of Dr. Nicolaes Tulp   ', 14),
(20, 'Les repasseuses (Women Ironing)            ', 14),
(1 , 'The Birth of Venus                         ', 15),
(12, 'Femme se promenant dans une foret exotique ', 15),
(24, 'Portrait of the Painter’s Mother           ', 15),
(28, 'Jeunes filles au piano                     ', 15),
(7 , 'Portrait de l artiste (Self-portrait)      ', 16),
(3 , 'The Last Supper                            ', 16),
(13, 'Combat of a Tiger and a Buffalo            ', 16),
(4 , 'The Creation of Man                        ', 17),
(22, 'Le Chemin de Fer                           ', 17),
(6 , 'Femmes de Tahiti [Sur la plage]            ', 18),
(21, 'Le Bar aux Folies-Berg                     ', 18),
(26, 'Lady at the Piano                          ', 18),
(15, 'Remembrance of a Garden                    ', 18),
(16, '1914                                       ', 18),
(14, 'Ancient Sound, Abstract on Black           ', 19),
(8 , 'The Large Turf                             ', 19),
(23, 'On the Beach                               ', 19),
(2 , 'Portrait of Mona Lisa                      ', 19),
(27, 'On the Terrace                             ', 20),
(10, 'The She-Wolf                               ', 20);

declare @BinCount int = 10;
declare @BinSize int;
select @BinSize = 1 + count(*) / @BinCount from @T;

select T.id,
       T.title,
       T.price,
       1 + rank() over(order by T.price) / @BinSize as decile
from @T as T;
Run Code Online (Sandbox Code Playgroud)

结果:

id  title                                       price  decile
--- ------------------------------------------- ------ --------------------
19  Deux fillettes, fond jaune et rouge         11     1
17  Flowers in a Pitcher                        12     1
5   Composition with Red, Yellow and Blue       12     1
18  La lecon de musique (The Music Lesson)      12     1
9   The Adoration of the Magi                   12     1
29  Self-Portrait                               14     3
25  Symphony in White, No. 1: The White Girl    14     3
30  The Anatomy Lecture of Dr. Nicolaes Tulp    14     3
20  Les repasseuses (Women Ironing)             14     3
1   The Birth of Venus                          15     4
12  Femme se promenant dans une foret exotique  15     4
24  Portrait of the Painter’s Mother            15     4
28  Jeunes filles au piano                      15     4
7   Portrait de l artiste (Self-portrait)       16     5
3   The Last Supper                             16     5
13  Combat of a Tiger and a Buffalo             16     5
4   The Creation of Man                         17     6
22  Le Chemin de Fer                            17     6
6   Femmes de Tahiti [Sur la plage]             18     7
21  Le Bar aux Folies-Berg                      18     7
26  Lady at the Piano                           18     7
15  Remembrance of a Garden                     18     7
16  1914                                        18     7
14  Ancient Sound, Abstract on Black            19     9
8   The Large Turf                              19     9
23  On the Beach                                19     9
2   Portrait of Mona Lisa                       19     9
27  On the Terrace                              20     10
10  The She-Wolf                                20     10
Run Code Online (Sandbox Code Playgroud)

而且我不关心哪个十分位数

请注意,带有示例数据的 bin 2 和 8 最终为空。