修改CTE的SQLite查询

M. *_*ith 2 sql sqlite android common-table-expression recursive-cte

这是我的查询:

    WITH desc_table(counter, hourly, current_weather_description, current_icons, time_stamp) AS (
Select count(*) AS counter, CASE WHEN  strftime('%M',  'now') < '30' 
                THEN strftime('%H', 'now')  
                ELSE strftime('%H', time_stamp, '+1 hours') END as hourly, 
                current_weather_description,
                current_icons,
                time_stamp
                From weather_events
                GROUP BY strftime('%H',  time_stamp, '+30 minutes'), current_weather_description
                UNION ALL
                Select count(*) as counter, hourly - 1, current_weather_description, current_icons, time_stamp
                From weather_events
                GROUP BY strftime('%H',  time_stamp, '+30 minutes'), current_weather_description
                Order By counter desc limit 1
                ),
        avg_temp_table(avg_temp, hour_seg, time_stamp) AS (
        select avg(current_temperatures) as avg_temp, CASE WHEN  strftime('%M',  time_stamp) < '30' 
                THEN strftime('%H', time_stamp)  
                ELSE strftime('%H', time_stamp, '+1 hours') END as hour_seg, 
                time_stamp
                from weather_events
                group by strftime('%H',  time_stamp, '+30 minutes')
                order by hour_seg desc
                )

                Select  hourly, current_weather_description
                from desc_table
                join avg_temp_table
                on desc_table.hourly=avg_temp_table.hour_seg
Run Code Online (Sandbox Code Playgroud)

基本上我有一些天气数据,我分组为小时间隔(偏移30分钟),我想具体计算我在该时间间隔内获得特定天气描述(和匹配图标)的次数,并选择其中的天气描述出现次数最多的时间间隔(计数)(desc_table).然后我想获得该时间段内的平均温度((avg_temp_table)(也许我需要一个子查询?要做这个avg而不是我如何拥有它)并沿着它们的小时列加入两个查询.

我希望我的锚点基于查询的生成时间(现在),并计算出现次数,然后下一个成员每次减去一个小时,然后转到下一个时间间隔和计数等.

样本数据,常规数据集{current_temperatures,current_weather_description,current_icons,time_stamp}的每个时间段内将有更多行:

"87"    "Rain"  "rainicon"  "2016-01-20 02:15:08"
"65"    "Snow"  "snowicon"  "2016-01-20 02:39:08"
"49"    "Rain"  "rainicon"  "2016-01-20 03:15:08"
"49"    "Rain"  "rainicon"  "2016-01-20 03:39:08"
"46"    "Clear" "clearicon" "2016-01-20 04:15:29"
"46"    "Clear" "clearicon" "2016-01-20 04:38:53"
"46"    "Cloudy" "cloudyicon" "2016-01-20 05:15:08"
"46"    "Clear" "clearicon" "2016-01-20 05:39:08"
"45"    "Clear" "clearicon" "2016-01-20 06:14:17"
"45"    "Clear" "clearicon" "2016-01-20 06:34:23"
"45"    "Clear" "clearicon" "2016-01-20 07:24:54"
"45"    "Rain"  "rainicon"  "2016-01-20 07:44:41"
"43"    "Rain"  "rainicon"  "2016-01-20 08:19:08"
"36"    "Clear" "clearicon" "2016-01-20 08:39:08"
"35"    "Meatballs" "meatballsicon" "2016-01-20 09:18:08"
"18"    "Cloudy" "cloudyicon" "2016-01-20 09:39:08"
Run Code Online (Sandbox Code Playgroud)

输出是时间间隔(avg_temp_table)的平均温度与第一个聚合CTE(desc_table){avg_temp,weather_description,current_icon}的输出之间的连接:

"87"    "Rain"  "rainicon"
"57"    "Rain"  "rainicon"
"47"    "Clear" "clearicon"
"46"    "Clear" "clearicon"
"46"    "Cloudy" "cloudyicon"
"45"    "Clear" "clearicon"
"44"    "Rain"  "rainicon"
"36"    "Clear" "clearicon"
"18"    "Cloudy" "cloudyicon"
Run Code Online (Sandbox Code Playgroud)

现在我得到一个没有这样的列错误,因为我的锚来自我的weather_events表,我的递归成员也是如此.当我将递归成员更改为desc_table时,我得到"递归聚合查询不支持错误".但是我不想从desc_table中获取递归成员,我希望按小时分段,然后遍历每小时间隔并获得计数.我猜我也是错误地开始做锚.

And*_*y M 7

我仍然不确定你的desc_table递归CTE应该如何选择最高的天气描述及其每小时的图标,但这很好,因为,使用你的口头描述,我想我已经想出了一种方法来做同样没有递归.

首先,按小时和描述对结果进行分组,并计算每组中的行数:

SELECT
  strftime('%H', time_stamp, '+30 minutes') AS hour,
  current_weather_description,
  current_icons,
  COUNT(*) AS event_count
FROM
  weather_events
GROUP BY
  strftime('%H', time_stamp, '+30 minutes'),
  current_weather_description
Run Code Online (Sandbox Code Playgroud)

下一步,按小时对上述查询的结果进行分组,并获取每小时最大事件数:

SELECT
  hour,
  MAX(event_count) AS max_event_count
FROM
  (
    SELECT
      strftime('%H', time_stamp, '+30 minutes') AS hour,
      current_weather_description,
      current_icons,
      COUNT(*) AS event_count
    FROM
      weather_events
    GROUP BY
      strftime('%H', time_stamp, '+30 minutes'),
      current_weather_description
  ) AS s
GROUP BY
  hour
Run Code Online (Sandbox Code Playgroud)

这仍然不是您想要的,因为您实际上希望描述和图标匹配最大计数,而不是计数本身.好吧,这很容易修复 - 只需将这些列添加到SELECT 而不将它们添加到GROUP BY:

SELECT
  hour,
  current_weather_description,
  current_icons,
  MAX(event_count) AS max_event_count
FROM
  (
    SELECT
      strftime('%H', time_stamp, '+30 minutes') AS hour,
      current_weather_description,
      current_icons,
      COUNT(*) AS event_count
    FROM
      weather_events
    GROUP BY
      strftime('%H', time_stamp, '+30 minutes'),
      current_weather_description
  ) AS s
GROUP BY
  hour
Run Code Online (Sandbox Code Playgroud)

您仍然需要保持MAX(event_count)查询中的技巧才能工作.它起作用的原因是因为在SQLite中,当SELECT语句包含单个MAX或单个MIN调用时,既不在GROUP BY中也不在聚合中的任何所选列的值将从与所述MAX或MIN值匹配的行中获取.SQLite 3.7.11发行说明中记录了 SQL的这种非标准扩展.

非常适合desc_table.至于avg_temp_tableCTE,你的当前方法似乎没有任何问题,除了我可能会使用GROUP BY表达式作为小时定义而不是你正在使用的CASE表达式,以保持一致性,并且time_stamp结果似乎也是多余的.所以稍微修改过的CTE看起来像这样:

SELECT
  strftime('%H', time_stamp, '+30 minutes') AS hour,
  AVG(current_temperatures) AS avg_temp
FROM
  weather_events
GROUP BY
  strftime('%H', time_stamp, '+30 minutes')
Run Code Online (Sandbox Code Playgroud)

现在,您只需要在列上连接两个集合,hour并为最终输出选择相关列:

SELECT
  t.avg_temp,
  d.current_weather_description,
  d.current_icons
FROM
  avg_temp_table AS t
  INNER JOIN desc_table AS d on t.hour = d.hour
ORDER BY
  t.hour
Run Code Online (Sandbox Code Playgroud)

所以你在这里.现在我想解决一个关于结果查询的问题,即

可以避免加入吗?

虽然您采用解决方案 - 分别获取描述和平均温度然后将两组连接在一起 - 很简单并且非常有意义,但是避免连接并同时进行所有计算会很好.这很可能会使查询更快,因为源只扫描一次.这可以实现吗?

碰巧,是的,它可以.组合这两个部分的主要困难在于,描述是通过两个步骤获得的,而平均温度的计算是单步操作.简单地放入AVG(current_temperatures)第一个CTE的嵌套SELECT(按小时和描述分组),然后对外部SELECT(按小时分组)的结果进行AVG在数学上不等同于在整个小时组中执行AVG一次.

相反,你需要记住的是AVG = SUM/COUNT.如果在第一步中获得SUM和COUNT,然后在第二步中获得SUM和SUM的COUNT,则可以将第一个外部SUM除以第二个外部SUM以获得平均值.

这是desc_table修改后的新CTE,它将查询的两个部分组合在一起(因此它不再是CTE而是完整的查询),并以粗体突出显示必要的更改:

SELECT
  SUM(total_temp) / SUM(event_count) AS avg_temp,
  current_weather_description,
  current_icons,
  MAX(event_count) AS max_event_count
FROM
  (
    SELECT
      strftime('%H', time_stamp, '+30 minutes') AS hour,
      current_weather_description,
      current_icons,
      COUNT(*) AS event_count,
      SUM(current_temperatures) AS total_temp
    FROM
      weather_events
    GROUP BY
      strftime('%H', time_stamp, '+30 minutes'),
      current_weather_description
  ) AS s
GROUP BY
  hour
ORDER BY
  hour
;
Run Code Online (Sandbox Code Playgroud)

显然,该max_event_count列对于输出是多余的 - 并且对于查询所依赖的"每组最大N"方法仍然是至关重要的.就个人而言,在这种情况下我不会担心一个冗余列,但是如果你有充分的理由将它排除在结果集之外,你可以使用上面的查询作为派生表(是的,再次)并拥有最外层的SELECT拉所有列除外max_event_count- 例如,像这样:

SELECT
  avg_temp,
  current_weather_description,
  current_icons
FROM
  (
    SELECT
      hour,
      SUM(total_temp) / SUM(event_count) AS avg_temp,
      current_weather_description,
      current_icons,
      MAX(event_count) AS max_event_count
    FROM
      (
        SELECT
          strftime('%H', time_stamp, '+30 minutes') AS hour,
          current_weather_description,
          current_icons,
          COUNT(*) AS event_count,
          SUM(current_temperatures) AS total_temp
        FROM
          weather_events
        GROUP BY
          strftime('%H', time_stamp, '+30 minutes'),
          current_weather_description
      ) AS s
    GROUP BY
      hour
  ) AS s
ORDER BY
  hour desc
;
Run Code Online (Sandbox Code Playgroud)

正如您所看到的,中间层SELECT现在也包括在内hour,这是最外层的ORDER BY所需要的.(我在这里假设订单对于调用应用程序很重要.)

我只能提到两种方法的结果之间的差异.在第一个中,AVG(current_temperatures)给出一个浮点结果.在第二个中,SUM(total_temp) / SUM(event_count)给出一个整数.由于您的预期结果显示整数平均值,我想这应该不是问题.但是,如果你以后决定你想为你的平均值更精确,只需记住,你可以在更换被SUM函数SUM(total_temp)SUM(current_temperatures)与TOTAL函数返回相同的值SUM但结果始终是一个real.划分real由一个integer收益率real在SQLite的,所以使用TOTAL你会得到相同的结果与AVG在第一种方法.