删除数据库中的重复值

Ahs*_*tar 15 mysql sql database

我有一个MySql表,每天填充价格值.即使价格没有变化,它也会每天记录一个条目.我想删除一些重复太多的行.我希望在价格变动之前保留第一个价格和最后价格.

例1)

   id name     price date
    1 Product1 $6 13/07/2017
    2 Product1 $6 14/07/2017
    3 Product1 $6 15/07/2017
    4 Product1 $7 16/07/2017
    5 Product1 $6 17/07/2017
    6 Product1 $6 18/07/2017
    7 Product1 $6 19/07/2017
Run Code Online (Sandbox Code Playgroud)

从该列表中删除ID为2和6的记录,结果如下:

   id name     price date
    1 Product1 $6 13/07/2017
    3 Product1 $6 15/07/2017
    4 Product1 $7 16/07/2017
    5 Product1 $6 17/07/2017
    7 Product1 $6 19/07/2017
Run Code Online (Sandbox Code Playgroud)

例2)

   id name     price date
    1 Product1 $6 13/07/2017
    2 Product1 $6 14/07/2017
    3 Product1 $6 15/07/2017
    4 Product1 $6 16/07/2017
    5 Product1 $6 17/07/2017
    6 Product1 $6 18/07/2017
    7 Product1 $6 19/07/2017
Run Code Online (Sandbox Code Playgroud)

这里没有价格变化,所以我可以删除2到6的所有记录:

   id name     price date
    1 Product1 $6 13/07/2017
    7 Product1 $6 19/07/2017
Run Code Online (Sandbox Code Playgroud)

Id不应该是一个增量,并且日期不是每天每天.

Bil*_*win 5

您可以使用一些创造性的自连接逻辑来执行此操作.

想想表中的三个假设行.

  • 你要保留的行.
  • 第b行具有相同的产品名称和价格,以及日期后1天的日期.你想删除它.
  • 行c具有相同的产品名称和价格,以及b之后的第1天的日期.你想保留这个.

因此,如果您可以执行自联接以匹配这三行,则删除行b.

DELETE b FROM MyTable AS a 
JOIN MyTable AS b ON a.name=b.name AND a.price=b.price AND a.date=b.date + INTERVAL 1 DAY 
JOIN MyTable AS c ON b.name=c.name AND b.price=c.price AND b.date=c.date + INTERVAL 1 DAY;
Run Code Online (Sandbox Code Playgroud)

即使有多行符合行b的条件,这仍然有效.它将删除第一个,然后继续删除也符合条件的后续行.

如果您使用DATE数据类型并将日期存储为"YYYY-MM-DD",而不是"DD-MM-YYYY",则此方法有效.无论如何你应该这样做.


use*_*679 2

这是我针对这个问题提交的第二个答案,但我想这次我终于得到了:

DELETE FROM products WHERE id IN (
    SELECT id_to_delete
    FROM (
        SELECT
            t0.id AS id_to_delete,
            t0.price,
            (
                SELECT t1.price
                FROM products AS t1
                WHERE (t0.date < t1.date)
                    AND (t0.name = t1.name)
                ORDER BY t1.date ASC
                LIMIT 1
            ) AS next_price,
            (
                SELECT t2.price
                FROM products AS t2
                WHERE (t0.date > t2.date)
                    AND (t0.name = t2.name)
                ORDER BY t2.date DESC
                LIMIT 1
            ) AS prev_price
        FROM products AS t0
        HAVING (price = next_price) AND (price = prev_price)
    ) AS t
)
Run Code Online (Sandbox Code Playgroud)

这是@vadim_hr 答案的修改版本。

编辑:下面是一个不同的查询,它过滤JOIN而不是子查询。 JOIN对于大型数据集,可能比前面的查询(上面)更快,但我将把性能测试留给您。

http://sqlfiddle.com/#!9/ee0655/8

SELECT M.id as id_to_delete
FROM
(
    SELECT
        *,
        (@j := @j + 1) AS j
    FROM
    (SELECT * FROM products ORDER BY name ASC, date ASC) AS mmm
    JOIN
    (SELECT @j := 1) AS mm
) AS M     -- the middle table
JOIN
(
    SELECT
        *,
        (@i := @i + 1) AS i
    FROM
    (SELECT * FROM products ORDER BY name ASC, date ASC) AS lll
    JOIN
    (SELECT @i := 0) AS ll
) AS L     -- the left table
ON M.j = L.i
    AND M.name = L.name
    AND M.price = L.price
JOIN
(
    SELECT
        *,
        (@k := @k + 1) AS k
    FROM
    (SELECT * FROM products ORDER BY name ASC, date ASC) AS rrr
    JOIN
    (SELECT @k := 2) AS rr
) AS R     -- the right table
ON M.j = R.k
    AND M.name = R.name
    AND M.price = R.price
Run Code Online (Sandbox Code Playgroud)

两个查询都达到相同的目的,并且它们都假设行是唯一的namedate如下面的评论中所述)。