Anl*_*nlo 5 sql t-sql sql-server sql-server-2008
SQL小提琴:http://sqlfiddle.com/#!3/9b459/6
我有一个表格,其中包含"你会参加这个活动吗?"这个问题的答案.每个用户可能会多次响应,所有答案都会存储在表格中.通常我们只对最新的答案感兴趣,并且我正在尝试构建一个有效的查询.我正在使用SQL Server 2008 R2.
一个事件的表内容:

Column types: int, int, datetime, bit
Primary key: (EventId, MemberId, Timestamp)
Run Code Online (Sandbox Code Playgroud)
请注意,会员18首先回答否,然后回答是,会员20首先回答是,然后回答否,会员11回答否,然后回答否.我想过滤掉这些成员的第一个答案.此外,可能会有多个应该过滤的答案 - 例如,用户可能会回答是,是,否,是,否,否,否.
我尝试了一些不同的想法,并通过输入所有查询,选择显示估计执行计划并比较每个查询的总成本(百分比),在SQL Server Management Studio中对它们进行了评估.这是评估性能的好方法吗?
到目前为止测试的不同查询:
-----------------------------------------------------------------
-- Subquery to select Answer (does not include Timestamp)
-- Cost: 63 %
-----------------------------------------------------------------
select distinct a.EventId, a.MemberId,
(
select top 1 Answer
from Attendees
where EventId = a.EventId
and MemberId = a.MemberId
order by Timestamp desc
) as Answer
from Attendees a
where a.EventId = 68
-----------------------------------------------------------------
-- Where with subquery to find max(Timestamp)
-- Cost: 13 %
-----------------------------------------------------------------
select a.EventId, a.MemberId, a.Timestamp, a.Answer
from Attendees a
where a.EventId = 68
and a.Timestamp =
(
select max(Timestamp)
from Attendees
where EventId = a.EventId
and MemberId = a.MemberId
)
order by a.TimeStamp;
-----------------------------------------------------------------
-- Group by to find max(Timestamp)
-- Subquery to select Answer matching max(Timestamp)
-- Cost: 23 %
-----------------------------------------------------------------
select a.EventId, a.MemberId, max(a.Timestamp),
(
select top 1 Answer
from Attendees
where EventId = a.EventId
and MemberId = a.MemberId
and Timestamp = max(a.Timestamp)
) as Answer
from Attendees a
where a.EventId = 68
group by a.EventId, a.MemberId
order by max(a.TimeStamp);
Run Code Online (Sandbox Code Playgroud)
避免为每个成员使用子查询会很好.在我尝试使用的最后一个查询中,group by但仍然必须使用"答案"列的子查询.我真的很喜欢这样的东西,但当然这不是有效的SQL:
select a.EventId, a.MemberId, max(a.Timestamp), a.Answer <-- Picked from the line selected by max(a.Timestamp)
from Attendees a
where a.EventId = 68
group by a.EventId, a.MemberId
order by max(a.TimeStamp);
Run Code Online (Sandbox Code Playgroud)
有效查询的任何其他想法?
编辑:
SQL Fiddle给我留下了非常深刻的印象,我现在已经输入了我的实际数据:http: //sqlfiddle.com/#!3/9b459/6
SQL Server 2008支持公用表表达式和窗口函数.
WITH recordsList
AS
(
SELECT EventID, MemberID, TimeStamp, Answer,
ROW_NUMBER() OVER (PARTITION BY EventID, MemberID
ORDER BY Timestamp DESC) rn
FROM tableName
)
SELECT EventID, MemberID, TimeStamp, Answer
FROM recordsList
WHERE rn = 1
Run Code Online (Sandbox Code Playgroud)
我也更喜欢 CTE 方法,但这里有另一个使用应该有效的子查询的选项:
SELECT T.EventId, T.MemberId, T.TimeStamp, T.Answer
FROM TableName T
JOIN (
SELECT EventId, MemberId, Max(Timestamp) MaxTimeStamp
FROM TableName
GROUP BY EventId, MemberId ) T2 ON T.EventId = T2.EventId
AND T.MemberId = T2.MemberId
AND T.TimeStamp = T2.MaxTimeStamp
Run Code Online (Sandbox Code Playgroud)
话虽如此,我认为 CTE 会有更好的性能。
编辑 - 不再确定性能 - 这是两者的SQL Fiddle - 你可以看到每个的执行计划。
祝你好运。