我可以优化这个 MERGE 语句吗?

cat*_*dev 6 sql-server optimization merge

我正在尝试在两个表之间进行单列合并。第一个表 ( VisitorSession) 有 40,000,000 行。第二个 ( ShoppingCart) 有 9,000,000 行。

在我的开发环境中,查询只需不到 8 分钟。但是在生产环境中,它应该占用更少(更强大的机器)。但是,我预计该查询至少需要 2 分钟才能在生产中运行。我知道这个查询会导致开发环境中的其他开发人员超时,这意味着它很容易导致客户超时。是否有更安全和/或更快的方法来执行此查询?

declare @dt datetime = cast(dateadd(month, -6, getdate()) as date);

merge ShoppingCart as TargetTable  -- 07:55 to complete in Dev
using 
(
  select * from -- 04:55 to run select, resulting in 12,727,927 rows in Dev
  (
    select
      visitorid  -- int, not null, foreign key
      ,useripaddress  -- varchar(55), null
      ,row_number() over 
      (partition by visitorid order by createdate desc) as [row]
    from VisitorSession (nolock)
    where UserIPAddress is not null
    and CreateDate > @dt   -- createdate is a datetime, not null
  ) as subTbl
  where subTbl.[row] = 1
) as SourceTable
on (TargetTable.VisitorID = SourceTable.VisitorID)  -- visitorid is not a primary key
when matched
  then update set
  TargetTable.UserIpAddress = SourceTable.UserIpAddress;
Run Code Online (Sandbox Code Playgroud)

Aar*_*and 17

就个人而言,我不喜欢,MERGE因为有很多未解决的错误:

MERGE在乐观并发和竞争条件方面也给人一种错误的安全感。有关更多详细信息,请参阅Dan Guzman 的博客文章

我不想在这里成为一个恐惧贩子。但我也发现语法不直观且令人生畏。所以我只会在真正需要的情况下使用它,并且我可以证明我不受上述任何问题的影响。我不知道将它用于只能以任何方式结束的操作中我可能会获得什么UPDATE

所以这里是我将如何做到这一点,使用我更熟悉的语法:

;WITH s AS 
(
  SELECT VisitorID, UserIpAddress FROM 
  (
    SELECT 
      VisitorID,
      UserIpAddress,
      rn = ROW_NUMBER() OVER (PARTITION BY VisitorID ORDER BY CreateDate DESC)
    FROM dbo.VisitorSession
    WHERE UserIpAddress IS NOT NULL
    AND CreateDate > @dt
  ) AS x
  WHERE rn = 1
)
UPDATE c
  SET c.UserIpAddress = s.UserIpAddress
  FROM dbo.ShoppingCart AS c
  INNER JOIN s
  ON c.VisitorID = s.VisitorID;
Run Code Online (Sandbox Code Playgroud)

您还可以将此操作分解为多个块,以减少对事务日志的影响,这反过来又会减少总体持续时间。我在这里写了关于这个的博客

以下是我将如何处理这种方法:

DECLARE 
  @dt DATE = DATEADD(MONTH, -6, SYSDATETIME()), 
  @rc INT = 1;

WHILE @rc > 0
BEGIN

  BEGIN TRANSACTION;

  ;WITH s AS 
  (
    SELECT TOP (100000) VisitorID, UserIpAddress FROM
    (
      SELECT 
        VisitorID,
        UserIpAddress,
        rn = ROW_NUMBER() OVER (PARTITION BY VisitorID ORDER BY CreateDate DESC)
      FROM dbo.VisitorSession AS s
      WHERE UserIpAddress IS NOT NULL
      AND CreateDate > @dt
      AND EXISTS
      ( 
        SELECT 1 FROM dbo.ShoppingCart AS c
          WHERE c.VisitorID = s.VisitorID
          AND (c.UserIpAddress <> s.UserIpAddress
          OR c.UserIpAddress IS NULL)
      )
    ) AS x
    WHERE rn = 1
  )
  UPDATE c
    SET c.UserIpAddress = s.UserIpAddress
    FROM dbo.ShoppingCart AS c
    INNER JOIN s
    ON c.VisitorID = s.VisitorID;

  SET @rc = @@ROWCOUNT;

  COMMIT TRANSACTION;
END
Run Code Online (Sandbox Code Playgroud)

当然,正如博客文章所示,您可以通过确保您的日志足够大来处理整个事务而不必增长来获得同样多的时间 - 大部分延迟可能来自许多适应您的自动增长操作大宗交易。可悲的是,在您完成操作之前,很难尝试猜测您需要多少事务日志......