Min/Max Date Values over Large Date Range depending on Value

Jef*_*ott 8 sql t-sql sql-server gaps-and-islands

I'm querying a snapshot of customer data that contains the snapshot date, the customer ID and the 'value' of that customer on that day. I use the LAG function to return the previous days value to know if there is a drop/rise/complete loss/complete new value (from £0 to > £0).

The end game is to identify the min and max dates where the customer was at £0 value.

Originally I tried MIN(Date) and Max(Date) grouping by the Customer and Value. However if a customer dropped to £0 over different date ranges, it would bring back the max of the latest date range and the min of the earliest, instead of the ideal - bring back both ranges where it was £0.

I've tried using DENSE_RANK() to split each the values of the customer, but doing so just ranks all £0 values in the same rank.

Here is some sample code to show you the data I'm working with and how i've tried to split it:

DROP TABLE IF EXISTS #SnapshotTable
CREATE TABLE #SnapshotTable
(
    Row_ID INT IDENTITY(1,1)
    ,SnapshotDate DATE
    ,SnapshotDateKey INT
    ,CustomerId INT
    ,Value DECIMAL(18,2)
)
INSERT INTO #SnapshotTable (SnapshotDate, SnapshotDateKey, CustomerId, Value)
SELECT '2019-01-01', 20190101, 1, 0.00
UNION SELECT '2019-01-02', 20190102, 1, 0.00
UNION SELECT '2019-01-03', 20190103, 1, 5.00
UNION SELECT '2019-01-04', 20190104, 1, 5.00
UNION SELECT '2019-01-05', 20190105, 1, 3.00
UNION SELECT '2019-01-06', 20190106, 1, 3.00
UNION SELECT '2019-01-07', 20190107, 1, 0.00
UNION SELECT '2019-01-08', 20190108, 1, 0.00
UNION SELECT '2019-01-09', 20190109, 1, 10.00
UNION SELECT '2019-01-10', 20190110, 1, 0.00

SELECT * FROM #SnapshotTable

-- Code that doesn't work correctly
SELECT
    CustomerId
    ,Value
    ,MinDate = MIN(SnapshotDateKey)
    ,MaxDate = MAX(SnapshotDateKey)
FROM #SnapshotTable
GROUP BY
    CustomerId
    ,Value

-- Attempted with dense rank
ALTER TABLE #SnapshotTable
ADD DenseRankTest INT NULL
GO
-- Update with Dense Rank
UPDATE TGT
SET 
    TGT.DenseRankTest = SRC.NewRank
FROM #SnapshotTable TGT
INNER JOIN (SELECT
                Row_ID
                ,NewRank = DENSE_RANK() OVER (PARTITION BY CustomerId ORDER BY Value ASC)
            FROM #SnapshotTable

            ) AS SRC
    ON SRC.Row_ID = TGT.Row_ID 

SELECT * FROM #SnapshotTable
Run Code Online (Sandbox Code Playgroud)

Now I can see that the dense_rank() function is kind of functioning how I want it to but honestly I've been looking at this for a while now and I cannot get my head around how to do it correctly.

Can somebody please advise on what I need to do?

I'm expecting to see:

SELECT [StartDateKey] = 20190101, [EndDateKey] = 20190102, [CustomerId] = 1, [Value] = 0
UNION SELECT [StartDateKey] = 20190103, [EndDateKey] = 20190104, [CustomerId] = 1, [Value] = 5
UNION SELECT [StartDateKey] = 20190105, [EndDateKey] = 20190106, [CustomerId] = 1, [Value] = 3
UNION SELECT [StartDateKey] = 20190107, [EndDateKey] = 20190108, [CustomerId] = 1, [Value] = 0
UNION SELECT [StartDateKey] = 20190109, [EndDateKey] = 20190109, [CustomerId] = 1, [Value] = 10
UNION SELECT [StartDateKey] = 20190120, [EndDateKey] = 20190110, [CustomerId] = 1, [Value] = 0
Run Code Online (Sandbox Code Playgroud)

Edit: For those who stumble across this, with the help of the people here I've found this as a good read for understanding the issue/solving the issue.

Gor*_*off 2

这是一个缺口和岛屿问题。但对所谓的副本所接受的答案根本不是解决这个问题的最佳方法。而得票率较高的答案仍然过于复杂。

一个更简单的方法是:

select customerid, value, min(SnapshotDateKey), max(SnapshotDateKey)
from (select st.*,
             row_number() over (partition by customerid, value order by snapshotdate) as seqnum
      from snapshottable st
     ) st
group by dateadd(day, -seqnum, snapshotdate), customerid, value
order by min(SnapshotDateKey);
Run Code Online (Sandbox Code Playgroud)

是一个 db<>fiddle。