在列中查找具有重复值的行

Question

在列中查找具有重复值的行

use*_*906 5 sql postgresql aggregate-functions duplicates window-functions

我有一张桌子author_data:

 author_id | author_name
 ----------+----------------
 9         | ernest jordan
 14        | k moribe
 15        | ernest jordan
 25        | william h nailon 
 79        | howard jason
 36        | k moribe

Run Code Online (Sandbox Code Playgroud)

现在我需要结果如下:

 author_id | author_name                                                  
 ----------+----------------
 9         | ernest jordan
 15        | ernest jordan     
 14        | k moribe 
 36        | k moribe

Run Code Online (Sandbox Code Playgroud)

也就是说,我需要author_id具有重复外观的名称.我试过这句话:

select author_id,count(author_name)
from author_data
group by author_name
having count(author_name)>1

Run Code Online (Sandbox Code Playgroud)

但它不起作用.我怎么能得到这个？

Answer 1

Erw*_*ter 9

我建议子查询中的窗口函数:

SELECT author_id, author_name  -- omit the name here, if you just need ids
FROM (
   SELECT author_id, author_name
        , count(*) OVER (PARTITION BY author_name) AS ct
   FROM   author_data
   ) sub
WHERE  ct > 1;

Run Code Online (Sandbox Code Playgroud)

您将识别基本的聚合函数count().它可以通过附加一个OVER子句变成一个窗口函数- 就像任何其他聚合函数一样.

这样,它计算每个分区的行数.瞧.

在没有窗口功能(v.8.3或更早版本)的旧版本中 - 或者通常 - 此替代版本执行速度非常快:

SELECT author_id, author_name  -- omit name, if you just need ids
FROM   author_data a
WHERE  EXISTS (
   SELECT 1
   FROM   author_data a2
   WHERE  a2.author_name = a.author_name
   AND    a2.author_id <> a.author_id
   );

Run Code Online (Sandbox Code Playgroud)

如果您关注性能,请添加索引author_name.

归档时间：	11 年，9 月前
查看次数：	6721 次
最近记录：	8 年前