在SQL中跨时间轴汇总值

uld*_*all 7 sql postgresql aggregate-functions date-arithmetic window-functions

问题

我有一个PostgreSQL数据库,我试图总结一下收银机的收入.收银机可以具有状态ACTIVE或INACTIVE,但我只想总结在给定时间段内处于ACTIVE状态时创建的收益.

我有两张桌子; 一个标志着收入,另一个标志着收银机状态:

CREATE TABLE counters
(
  id bigserial NOT NULL,
  "timestamp" timestamp with time zone,
  total_revenue bigint,
  id_of_machine character varying(50),
  CONSTRAINT counters_pkey PRIMARY KEY (id)
)

CREATE TABLE machine_lifecycle_events
(
  id bigserial NOT NULL,
  event_type character varying(50),
  "timestamp" timestamp with time zone,
  id_of_affected_machine character varying(50),
  CONSTRAINT machine_lifecycle_events_pkey PRIMARY KEY (id)
)
Run Code Online (Sandbox Code Playgroud)

每1分钟添加一个计数器条目,而total_revenue仅增加.每次机器状态发生变化时,都会添加machine_lifecycle_events条目.

我添加了一个说明问题的图像.应该总结蓝色时期的收入.

时间线显示问题.

到目前为止我尝试过的

我创建了一个查询,它可以在给定的瞬间为我提供总收入:

SELECT total_revenue 
  FROM counters 
 WHERE timestamp < '2014-03-05 11:00:00' 
       AND id_of_machine='1' 
ORDER BY 
       timestamp desc 
 LIMIT 1
Run Code Online (Sandbox Code Playgroud)

问题

  1. 如何计算两个时间戳之间的收入?
  2. 当我必须将machine_lifecycle_events中的时间戳与输入周期进行比较时,如何确定蓝色时段的开始和结束时间戳?

关于如何解决这个问题的任何想法?

更新

示例数据:

INSERT INTO counters VALUES
   (1,  '2014-03-01 00:00:00', 100,  '1')
 , (2,  '2014-03-01 12:00:00', 200,  '1')
 , (3,  '2014-03-02 00:00:00', 300,  '1')
 , (4,  '2014-03-02 12:00:00', 400,  '1')
 , (5,  '2014-03-03 00:00:00', 500,  '1')
 , (6,  '2014-03-03 12:00:00', 600,  '1')
 , (7,  '2014-03-04 00:00:00', 700,  '1')
 , (8,  '2014-03-04 12:00:00', 800,  '1')
 , (9,  '2014-03-05 00:00:00', 900,  '1')
 , (10, '2014-03-05 12:00:00', 1000, '1')
 , (11, '2014-03-06 00:00:00', 1100, '1')
 , (12, '2014-03-06 12:00:00', 1200, '1')
 , (13, '2014-03-07 00:00:00', 1300, '1')
 , (14, '2014-03-07 12:00:00', 1400, '1');

INSERT INTO machine_lifecycle_events VALUES
   (1, 'ACTIVE',   '2014-03-01 08:00:00', '1')
 , (2, 'INACTIVE', '2014-03-03 00:00:00', '1')
 , (3, 'ACTIVE',   '2014-03-05 00:00:00', '1')
 , (4, 'INACTIVE', '2014-03-06 12:00:00', '1');
Run Code Online (Sandbox Code Playgroud)

SQL提供样本数据.

示例查询:
"2014-03-02 08:00:00"和"2014-03-06 08:00:00"之间的收入在第一个ACTIVE期间为300. 100,在第二个ACTIVE期间为200.

Erw*_*ter 2

数据库设计

为了使我的工作更轻松,我在解决以下问题之前清理了您的数据库设计:

CREATE TEMP TABLE counter (
    id            bigserial PRIMARY KEY
  , ts            timestamp NOT NULL
  , total_revenue bigint NOT NULL
  , machine_id    int NOT NULL
);

CREATE TEMP TABLE machine_event (
    id            bigserial PRIMARY KEY
  , ts            timestamp NOT NULL
  , machine_id    int NOT NULL
  , status_active bool NOT NULL
);
Run Code Online (Sandbox Code Playgroud)

小提琴中的测试用例。

主要观点

  • 使用ts而不是“时间戳”。切勿使用基本类型名称作为列名称。
  • 简化并统一了名称machine_id,使其成为integer应有的名称,而不是varchar(50).
  • event_type varchar(50)也应该是integer外键,或者enum. 或者甚至只是仅boolean用于活动/非活动。简化为status_active bool.
  • INSERT还简化和净化了语句。

答案

假设

  • total_revenue only increases(每个问题)。
  • 包括外部时间范围的边界。
  • 每台机器的每个“下一个”行都有machine_event相反的status_active

1.如何计算两个时间戳之间赚取的收入?

WITH span AS (
   SELECT '2014-03-02 12:00'::timestamp AS s_from  -- start of time range
        , '2014-03-05 11:00'::timestamp AS s_to    -- end of time range
   )
SELECT machine_id, s.s_from, s.s_to
     , max(total_revenue) - min(total_revenue) AS earned
FROM   counter c
     , span s
WHERE  ts BETWEEN s_from AND s_to                  -- borders included!
AND    machine_id =  1
GROUP  BY 1,2,3;
Run Code Online (Sandbox Code Playgroud)

2.当我必须将时间戳与machine_event输入周期进行比较时,如何确定蓝色周期的开始和结束时间戳?

此查询针对给定时间范围 ( ) 内的所有span机器。
添加WHERE machine_id = 1CTEcte以选择特定机器。

WITH span AS (
   SELECT '2014-03-02 08:00'::timestamp AS s_from  -- start of time range
        , '2014-03-06 08:00'::timestamp AS s_to    -- end of time range
   )
, cte AS (
   SELECT machine_id, ts, status_active, s_from
        , lead(ts, 1, s_to) OVER w AS period_end
        , first_value(ts)   OVER w AS first_ts
   FROM   span          s
   JOIN   machine_event e ON e.ts BETWEEN s.s_from AND s.s_to
   WINDOW w AS (PARTITION BY machine_id ORDER BY ts)
   )
SELECT machine_id, ts AS period_start, period_end -- start in time frame
FROM   cte
WHERE  status_active

UNION  ALL                             -- active start before time frame
SELECT machine_id, s_from, ts
FROM   cte
WHERE  NOT status_active
AND    ts =  first_ts
AND    ts <> s_from

UNION  ALL       -- active start before time frame, no end in time frame
SELECT machine_id, s_from, s_to
FROM  (
   SELECT DISTINCT ON (1)
          e.machine_id, e.status_active, s.s_from, s.s_to
   FROM   span          s
   JOIN   machine_event e ON e.ts < s.s_from  -- only from before time range
   LEFT   JOIN cte c USING (machine_id)
   WHERE  c.machine_id IS NULL                -- not in selected time range
   ORDER  BY e.machine_id, e.ts DESC          -- only the latest entry
   ) sub
WHERE  status_active -- only if active
ORDER  BY 1, 2;
Run Code Online (Sandbox Code Playgroud)

结果是图像中蓝色周期的列表。
SQL Fiddle 演示了两者。

最近类似的问题:
Sum of time Difference across rows