Man*_*ngo 6 postgresql sql-server window-functions rank
使用ntile()窗口函数时,主要问题是它随意分组为大致相等的部分,而不管实际值如何。
例如使用以下查询:
select
id,title,price,
row_number() over(order by price) as row_number,
rank() over(order by price) as rank,
count(*) over(order by price) as count,
dense_rank() over(order by price) as dense_rank,
ntile(10) over(order by price) as decile
from paintings
order by price;
Run Code Online (Sandbox Code Playgroud)
我会得到 10 组大小大致相同的画作,很有可能以相同的价格结束在不同的箱子里。
例如:
select
id,title,price,
row_number() over(order by price) as row_number,
rank() over(order by price) as rank,
count(*) over(order by price) as count,
dense_rank() over(order by price) as dense_rank,
ntile(10) over(order by price) as decile
from paintings
order by price;
Run Code Online (Sandbox Code Playgroud)
请注意,有四个带有 price 的项目12,但其中两个在十分位数 1 中,其中两个在十分位数 2 中。我想将这些项目放在一起,并且我不会为哪个十分位数大惊小怪。
我已经包含了其他窗口函数来进行比较。
似乎ntile()使用row_number()only 并以此为基础进行截止。如果它使用rank()或count(*)功能,因为具有相同价格的物品最终会出现在同一个 bin 中,这将更加公平。
这是 PostgreSQL 和 SQL Server 的行为,大概也是其余的行为。
问题是,有没有办法实现这一目标?
您可以使用rank()每个 bin 的行数进行整数除法。
declare @T table(id int, title varchar(100), price int);
insert into @T(id, title, price) values
(19, 'Deux fillettes, fond jaune et rouge ', 11),
(17, 'Flowers in a Pitcher ', 12),
(5 , 'Composition with Red, Yellow and Blue ', 12),
(18, 'La lecon de musique (The Music Lesson) ', 12),
(9 , 'The Adoration of the Magi ', 12),
(29, 'Self-Portrait ', 14),
(25, 'Symphony in White, No. 1: The White Girl ', 14),
(30, 'The Anatomy Lecture of Dr. Nicolaes Tulp ', 14),
(20, 'Les repasseuses (Women Ironing) ', 14),
(1 , 'The Birth of Venus ', 15),
(12, 'Femme se promenant dans une foret exotique ', 15),
(24, 'Portrait of the Painter’s Mother ', 15),
(28, 'Jeunes filles au piano ', 15),
(7 , 'Portrait de l artiste (Self-portrait) ', 16),
(3 , 'The Last Supper ', 16),
(13, 'Combat of a Tiger and a Buffalo ', 16),
(4 , 'The Creation of Man ', 17),
(22, 'Le Chemin de Fer ', 17),
(6 , 'Femmes de Tahiti [Sur la plage] ', 18),
(21, 'Le Bar aux Folies-Berg ', 18),
(26, 'Lady at the Piano ', 18),
(15, 'Remembrance of a Garden ', 18),
(16, '1914 ', 18),
(14, 'Ancient Sound, Abstract on Black ', 19),
(8 , 'The Large Turf ', 19),
(23, 'On the Beach ', 19),
(2 , 'Portrait of Mona Lisa ', 19),
(27, 'On the Terrace ', 20),
(10, 'The She-Wolf ', 20);
declare @BinCount int = 10;
declare @BinSize int;
select @BinSize = 1 + count(*) / @BinCount from @T;
select T.id,
T.title,
T.price,
1 + rank() over(order by T.price) / @BinSize as decile
from @T as T;
Run Code Online (Sandbox Code Playgroud)
结果:
id title price decile
--- ------------------------------------------- ------ --------------------
19 Deux fillettes, fond jaune et rouge 11 1
17 Flowers in a Pitcher 12 1
5 Composition with Red, Yellow and Blue 12 1
18 La lecon de musique (The Music Lesson) 12 1
9 The Adoration of the Magi 12 1
29 Self-Portrait 14 3
25 Symphony in White, No. 1: The White Girl 14 3
30 The Anatomy Lecture of Dr. Nicolaes Tulp 14 3
20 Les repasseuses (Women Ironing) 14 3
1 The Birth of Venus 15 4
12 Femme se promenant dans une foret exotique 15 4
24 Portrait of the Painter’s Mother 15 4
28 Jeunes filles au piano 15 4
7 Portrait de l artiste (Self-portrait) 16 5
3 The Last Supper 16 5
13 Combat of a Tiger and a Buffalo 16 5
4 The Creation of Man 17 6
22 Le Chemin de Fer 17 6
6 Femmes de Tahiti [Sur la plage] 18 7
21 Le Bar aux Folies-Berg 18 7
26 Lady at the Piano 18 7
15 Remembrance of a Garden 18 7
16 1914 18 7
14 Ancient Sound, Abstract on Black 19 9
8 The Large Turf 19 9
23 On the Beach 19 9
2 Portrait of Mona Lisa 19 9
27 On the Terrace 20 10
10 The She-Wolf 20 10
Run Code Online (Sandbox Code Playgroud)
而且我不关心哪个十分位数
请注意,带有示例数据的 bin 2 和 8 最终为空。