性能不佳的Mysql子查询 - 我可以把它变成一个Join吗?

Bri*_*ins 2 mysql sql

我有一个导致性能不佳的子查询问题......我认为子查询可以使用连接重写,但我很难绕过它.

查询的要点是这样的:对于给定的EmailAddress和Product的组合,我需要得到一个不是最新的ID列表....这些订单将在表格中标记为"过时"只会给出一个给定的EmailAddress和Product组合的最新订单......(这有意义吗?)

表定义

CREATE TABLE  `sandbox`.`OrderHistoryTable` (
 `id` INT( 11 ) NOT NULL AUTO_INCREMENT ,
 `EmailAddress` VARCHAR( 100 ) NOT NULL ,
 `Product` VARCHAR( 100 ) NOT NULL ,
 `OrderDate` DATE NOT NULL ,
 `rowlastupdated` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ,
PRIMARY KEY (  `id` ) ,
KEY  `EmailAddress` (  `EmailAddress` ) ,
KEY  `Product` (  `Product` ) ,
KEY  `OrderDate` (  `OrderDate` )
) ENGINE = MYISAM DEFAULT CHARSET = latin1;
Run Code Online (Sandbox Code Playgroud)

询问

SELECT id
FROM
OrderHistoryTable AS EMP1
WHERE
OrderDate not in 
   (
   Select max(OrderDate)
   FROM OrderHistoryTable AS EMP2
   WHERE 
       EMP1.EmailAddress =  EMP2.EmailAddress
   AND EMP1.Product IN ('ProductA','ProductB','ProductC','ProductD')
   AND EMP2.Product IN ('ProductA','ProductB','ProductC','ProductD')
   )
Run Code Online (Sandbox Code Playgroud)

重复'IN'语句的解释

13   bob@aol.com  ProductA  2010-10-01
15   bob@aol.com  ProductB  2010-20-02
46   bob@aol.com  ProductD  2010-20-03
57   bob@aol.com  ProductC  2010-20-04
158  bob@aol.com  ProductE  2010-20-05
206  bob@aol.com  ProductB  2010-20-06
501  bob@aol.com  ProductZ  2010-20-07
Run Code Online (Sandbox Code Playgroud)

我的查询结果应为| 13 | | 15 | | 46 | | 57 |

这是因为,在列出的订单中,这4个已被同一类别的产品的新订单"取代".该"类别"包含产品A,B,C和D.

订单ID 158和501基于查询在其各自的类别中不显示其他订单.

最终查询基于以下接受的答案: 我最终使用了以下查询而没有子查询,并且获得了大约3倍的性能(从90秒下降30秒).我现在还有一个单独的"组"表,我可以枚举组成员,而不是在查询本身中拼写出来...

SELECT DISTINCT id, EmailAddress FROM (
  SELECT a.id, a.EmailAddress, a.OrderDate
  FROM OrderHistoryTable a
  INNER JOIN OrderHistoryTable b ON a.EmailAddress = b.EmailAddress
  INNER JOIN groups g1  ON  a.Product = g1.Product 
  INNER JOIN groups g2  ON  b.Product = g2.Product 
  WHERE 
        g1.family = 'ProductGroupX'
    AND g2.family = 'ProductGroupX'
  GROUP BY a.id, a.OrderDate, b.OrderDate
  HAVING  a.OrderDate < MAX(b.OrderDate)
) dtX
Run Code Online (Sandbox Code Playgroud)

OMG*_*ies 5

使用:

   SELECT a.id
     FROM ORDERHISTORYTABLE AS a
LEFT JOIN (SELECT e.EmailAddress,
                  e.product,
                  MAX(OrderDate) AS max_date
             FROM OrderHistoryTable AS e
            WHERE e.Product IN ('ProductA','ProductB','ProductC','ProductD')
         GROUP BY e.EmailAddress) b ON b.emailaddress = a.emailaddress
                                   AND b.max_date = a.orderdate
                                   AND b.product = a.product
    WHERE x.emailaddress IS NULL
      AND a.Product IN ('ProductA','ProductB','ProductC','ProductD')
Run Code Online (Sandbox Code Playgroud)