MySql每组中的第二小元素

bav*_*aza 10 mysql group-by aggregate greatest-n-per-group

我有一个类似于以下的表:

    date    |   expiry
-------------------------    
2010-01-01  | 2010-02-01
2010-01-01  | 2010-03-02
2010-01-01  | 2010-04-04
2010-02-01  | 2010-03-01
2010-02-01  | 2010-04-02
Run Code Online (Sandbox Code Playgroud)

在表中,每个日期可能有多个"到期"值.我需要一个返回每个日期中第n个最小到期的查询.例如,对于n = 2,我希望:

     date    |   expiry
-------------------------       
2010-01-01  | 2010-03-02
2010-02-01  | 2010-04-02
Run Code Online (Sandbox Code Playgroud)

我的麻烦是AFAIK,没有聚合函数返回第n个最大/最小元素,所以我不能使用'GROUP BY'.更具体地说,如果我有一个神奇的MIN()聚合接受第二个参数'offset',我会写:

SELECT MIN(expiry, 1) FROM table WHERE date IN ('2010-01-01', '2010-02-01') GROUP BY date
Run Code Online (Sandbox Code Playgroud)

有什么建议?

Dam*_*n R 10

一个hack就是使用group_concat.按日期分组并按升序排列到期日期,并使用substring_index函数获取第n个值.

mysql> select * from expiry;
+------------+------------+
| date       | expiry     |
+------------+------------+
| 2010-01-01 | 2010-02-01 |
| 2010-01-01 | 2010-03-02 |
| 2010-01-01 | 2010-04-04 |
| 2010-02-01 | 2010-03-01 |
| 2010-02-01 | 2010-04-02 |
+------------+------------+
5 rows in set (0.00 sec)

mysql> SELECT mdate,
       Substring_index(Substring_index(edate, ',', 2), ',', -1) AS exp_date
FROM   (SELECT `date`               AS mdate,
               GROUP_CONCAT(expiry order by expiry asc separator ",") AS edate
        FROM   expiry
        GROUP  BY mdate) e1;  
+------------+------------+
| mdate      | exp_date   |
+------------+------------+
| 2010-01-01 | 2010-03-02 |
| 2010-02-01 | 2010-04-02 |
+------------+------------+
2 rows in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)

在此处的示例中,子查询提供以下输出:

+------------+----------------------------------+
| mdate      | edate                            |
+------------+----------------------------------+
| 2010-01-01 | 2010-02-01,2010-03-02,2010-04-04 |
| 2010-02-01 | 2010-03-01,2010-04-02            |
+------------+----------------------------------+
Run Code Online (Sandbox Code Playgroud)

substring_index(edate,',',2)向前传递2个元素(第n个元素用2代替n).

+------------+------------------------------+
| mdate      | substring_index(edate,',',2) |
+------------+------------------------------+
| 2010-01-01 | 2010-02-01,2010-03-02        |
| 2010-02-01 | 2010-03-01,2010-04-02        |
+------------+------------------------------+
Run Code Online (Sandbox Code Playgroud)

我们在上面的输出上运行另一个substring_index以使用substring_index(substring_index(edate,',',2),',', - 1)仅获取第二个元素(中间结果的最后一个元素)

+------------+------------------------------------------------------+
| mdate      | substring_index(substring_index(edate,',',2),',',-1) |
+------------+------------------------------------------------------+
| 2010-01-01 | 2010-03-02                                           |
| 2010-02-01 | 2010-04-02                                           |
+------------+------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

如果连接的值太多,则可能会耗尽group_concat_max_len值(默认值为1024,但可以设置为更高).

更新:上面给出的SQL即使在该组中有少量n个元素时也会给出第n个元素.为了避免sql可以修改为:

SELECT mdate,
       IF(cnt >= 2,Substring_index(Substring_index(edate, ',', 2), ',', -1),NULL) AS exp_date
FROM   (SELECT `date`               AS mdate,
               count(expiry) as cnt,
               GROUP_CONCAT(expiry order by expiry asc separator ",") AS edate
        FROM   expiry
        GROUP  BY mdate) e1;  
Run Code Online (Sandbox Code Playgroud)