m__*_*m__ 4 sql-server partitioning select sql-server-2012 group-by
背景
我有几个设备,每个设备都有几个传感器。我不时地记录这些并将它们存储在下面描述的表中。当有人请求一个网页时,我会一个接一个地获取这些值中的几个(最新记录的)并将它们显示给用户。但是目前这需要很长时间,因为需要提取的值太多,每个值的提取需要大约 8 毫秒,并且我们总共讨论了大约 300 毫秒的总页面加载时间增加 - 对于一个相对较好的页面。
CREATE TABLE [dbo].[SensorValues](
[DeviceId] [int] NOT NULL,
[SensorId] [int] NOT NULL,
[SensorValue] [int] NOT NULL,
[Date] [int] NOT NULL, --- stored as unixtime
CONSTRAINT [PK_SensorValues] PRIMARY KEY CLUSTERED
(
[DeviceId] ASC,
[SensorId] ASC,
[Date] DESC
);
Run Code Online (Sandbox Code Playgroud)
该表在日期列上每周进行分区。
我现在应该做什么
所以,我做的是以下。我选择每个分区中当前日期/时间之前的最大值。并选出最大的值。
SELECT TOP (1) ca.SensorValue, ca.Date
FROM sys.partitions AS p
CROSS APPLY
(
SELECT TOP (1) v.Date, v.SensorValue
FROM SensorValue AS v
WHERE $PARTITION.SensorValues_Date_PF(v.Date) = p.[partition_number]
AND v.DeviceId = @fDeviceId
AND v.SensorId = @fSensorId
AND v.Date <= @fDate
ORDER BY v.Date DESC
) AS ca
WHERE p.[partition_number] <= $PARTITION.SensorValues_Date_PF(@fDate)
AND p.[object_id] = OBJECT_ID(N'dbo.SensorValues', N'U')
AND p.index_id = 1
ORDER BY p.[partition_number] DESC, ca.Date DESC;
Run Code Online (Sandbox Code Playgroud)
我想做的事
我想在一个查询中选择所有值。例如,选择 DeviceId=1 和 SensorId=1,2,3,4,5 的最新值。到目前为止,我已经提出了以下内容,其中我使用 IN 关键字选择以获取多个传感器的值。但是,我仍然需要将它们分组并整理出日期最高的那个。我正在考虑添加一个 GROUP BY 子句,但不知道如何正确使用(到目前为止我尝试过的那些都失败了)。
SELECT ca.SensorValue, ca.Date
FROM sys.partitions AS p
CROSS APPLY
(
SELECT TOP (1) v.Date, v.SensorValue
FROM SensorValue AS v
WHERE $PARTITION.SensorValues_Date_PF(v.Date) = p.[partition_number]
AND v.DeviceId = @fDeviceId
AND v.SensorId IN (@fSensorId1, @fSensorId2, @fSensorId3)
AND v.Date <= @fDate
ORDER BY v.Date DESC
) AS ca
WHERE p.[partition_number] <= $PARTITION.SensorValues_Date_PF(@fDate)
AND p.[object_id] = OBJECT_ID(N'dbo.SensorValues', N'U')
AND p.index_id = 1
ORDER BY p.[partition_number] DESC, ca.Date DESC;
Run Code Online (Sandbox Code Playgroud)
首先,我注意到您的“我现在做什么”查询:
SELECT TOP (1)
ca.SensorValue,
ca.Date
FROM sys.partitions AS p
CROSS APPLY
(
SELECT TOP (1)
v.Date,
v.SensorValue
FROM SensorValues AS v
WHERE
$PARTITION.SensorValues_Date_PF(v.Date) = p.[partition_number]
AND v.DeviceId = @fDeviceId
AND v.SensorId = @fSensorId
AND v.Date <= @fDate
ORDER BY
v.Date DESC
) AS ca
WHERE
p.[partition_number] <= $PARTITION.SensorValues_Date_PF(@fDate)
AND p.[object_id] = OBJECT_ID(N'dbo.SensorValues', N'U')
AND p.index_id = 1
ORDER BY
p.[partition_number] DESC,
ca.Date DESC;
Run Code Online (Sandbox Code Playgroud)
...产生这样的执行计划:

该执行计划的估计总成本为0.02 个单位。超过 50% 的估计成本是最终排序,以 Top-N 模式运行。现在估计就是这样,但是排序通常很昂贵,所以让我们在不改变语义的情况下删除它:
SELECT TOP (1)
ca.SensorId,
ca.SensorValue,
ca.Date
FROM
(
-- Partition numbers
SELECT DISTINCT
partition_number = prv.boundary_id
FROM
sys.partition_functions AS pf
JOIN sys.partition_range_values AS prv ON
prv.function_id = pf.function_id
WHERE
pf.name = N'SensorValues_Date_PF'
AND prv.boundary_id <= $PARTITION.SensorValues_Date_PF(@fDate)
) AS p
CROSS APPLY
(
SELECT TOP (1)
v.Date,
v.SensorValue,
v.SensorId
FROM dbo.SensorValues AS v
WHERE
$PARTITION.SensorValues_Date_PF(v.Date) = p.partition_number
AND v.DeviceId = @fDeviceId
AND v.SensorId = @fSensorId
AND v.Date <= @fDate
ORDER BY
v.Date DESC
) AS ca
ORDER BY
p.partition_number DESC,
ca.Date DESC
Run Code Online (Sandbox Code Playgroud)
现在执行计划没有阻塞操作符,也没有特别的排序。下面的新查询计划的估计成本是0.01 个单位,总成本平均分布在数据访问方法上:

随着改进到位,我们需要为每个传感器 ID 生成一个结果,就是为每个传感器 ID 和APPLY之前的代码制作一个列表:
SELECT
PerSensor.SensorId,
PerSensor.SensorValue,
PerSensor.Date
FROM
(
-- Sensor ID list
VALUES
(@fSensorId1),
(@FSensorId2),
(@FSensorId3)
) AS Sensor (Id)
CROSS APPLY
(
-- Optimized code applied to each sensor
SELECT TOP (1)
ca.SensorId,
ca.SensorValue,
ca.Date
FROM
(
-- Partition numbers
SELECT DISTINCT
partition_number = prv.boundary_id
FROM
sys.partition_functions AS pf
JOIN sys.partition_range_values AS prv ON
prv.function_id = pf.function_id
WHERE
pf.name = N'SensorValues_Date_PF'
AND prv.boundary_id <= $PARTITION.SensorValues_Date_PF(@fDate)
) AS p
CROSS APPLY
(
SELECT TOP (1)
v.Date,
v.SensorValue,
v.SensorId
FROM dbo.SensorValues AS v
WHERE
$PARTITION.SensorValues_Date_PF(v.Date) = p.partition_number
AND v.DeviceId = @fDeviceId
AND v.SensorId = Sensor.Id--@fSensorId1
AND v.Date <= @fDate
ORDER BY
v.Date DESC
) AS ca
ORDER BY
p.partition_number DESC,
ca.Date DESC
) AS PerSensor;
Run Code Online (Sandbox Code Playgroud)
查询计划是:

三个传感器 ID 的估计查询计划成本为0.011 - 原始单传感器计划成本的一半。