M. *_*ith 2 sql sqlite android common-table-expression recursive-cte
这是我的查询:
WITH desc_table(counter, hourly, current_weather_description, current_icons, time_stamp) AS (
Select count(*) AS counter, CASE WHEN strftime('%M', 'now') < '30'
THEN strftime('%H', 'now')
ELSE strftime('%H', time_stamp, '+1 hours') END as hourly,
current_weather_description,
current_icons,
time_stamp
From weather_events
GROUP BY strftime('%H', time_stamp, '+30 minutes'), current_weather_description
UNION ALL
Select count(*) as counter, hourly - 1, current_weather_description, current_icons, time_stamp
From weather_events
GROUP BY strftime('%H', time_stamp, '+30 minutes'), current_weather_description
Order By counter desc limit 1
),
avg_temp_table(avg_temp, hour_seg, time_stamp) AS (
select avg(current_temperatures) as avg_temp, CASE WHEN strftime('%M', time_stamp) < '30'
THEN strftime('%H', time_stamp)
ELSE strftime('%H', time_stamp, '+1 hours') END as hour_seg,
time_stamp
from weather_events
group by strftime('%H', time_stamp, '+30 minutes')
order by hour_seg desc
)
Select hourly, current_weather_description
from desc_table
join avg_temp_table
on desc_table.hourly=avg_temp_table.hour_seg
Run Code Online (Sandbox Code Playgroud)
基本上我有一些天气数据,我分组为小时间隔(偏移30分钟),我想具体计算我在该时间间隔内获得特定天气描述(和匹配图标)的次数,并选择其中的天气描述出现次数最多的时间间隔(计数)(desc_table).然后我想获得该时间段内的平均温度((avg_temp_table)(也许我需要一个子查询?要做这个avg而不是我如何拥有它)并沿着它们的小时列加入两个查询.
我希望我的锚点基于查询的生成时间(现在),并计算出现次数,然后下一个成员每次减去一个小时,然后转到下一个时间间隔和计数等.
样本数据,常规数据集{current_temperatures,current_weather_description,current_icons,time_stamp}的每个时间段内将有更多行:
"87" "Rain" "rainicon" "2016-01-20 02:15:08"
"65" "Snow" "snowicon" "2016-01-20 02:39:08"
"49" "Rain" "rainicon" "2016-01-20 03:15:08"
"49" "Rain" "rainicon" "2016-01-20 03:39:08"
"46" "Clear" "clearicon" "2016-01-20 04:15:29"
"46" "Clear" "clearicon" "2016-01-20 04:38:53"
"46" "Cloudy" "cloudyicon" "2016-01-20 05:15:08"
"46" "Clear" "clearicon" "2016-01-20 05:39:08"
"45" "Clear" "clearicon" "2016-01-20 06:14:17"
"45" "Clear" "clearicon" "2016-01-20 06:34:23"
"45" "Clear" "clearicon" "2016-01-20 07:24:54"
"45" "Rain" "rainicon" "2016-01-20 07:44:41"
"43" "Rain" "rainicon" "2016-01-20 08:19:08"
"36" "Clear" "clearicon" "2016-01-20 08:39:08"
"35" "Meatballs" "meatballsicon" "2016-01-20 09:18:08"
"18" "Cloudy" "cloudyicon" "2016-01-20 09:39:08"
Run Code Online (Sandbox Code Playgroud)
输出是时间间隔(avg_temp_table)的平均温度与第一个聚合CTE(desc_table){avg_temp,weather_description,current_icon}的输出之间的连接:
"87" "Rain" "rainicon"
"57" "Rain" "rainicon"
"47" "Clear" "clearicon"
"46" "Clear" "clearicon"
"46" "Cloudy" "cloudyicon"
"45" "Clear" "clearicon"
"44" "Rain" "rainicon"
"36" "Clear" "clearicon"
"18" "Cloudy" "cloudyicon"
Run Code Online (Sandbox Code Playgroud)
现在我得到一个没有这样的列错误,因为我的锚来自我的weather_events表,我的递归成员也是如此.当我将递归成员更改为desc_table时,我得到"递归聚合查询不支持错误".但是我不想从desc_table中获取递归成员,我希望按小时分段,然后遍历每小时间隔并获得计数.我猜我也是错误地开始做锚.
我仍然不确定你的desc_table
递归CTE应该如何选择最高的天气描述及其每小时的图标,但这很好,因为,使用你的口头描述,我想我已经想出了一种方法来做同样没有递归.
首先,按小时和描述对结果进行分组,并计算每组中的行数:
SELECT
strftime('%H', time_stamp, '+30 minutes') AS hour,
current_weather_description,
current_icons,
COUNT(*) AS event_count
FROM
weather_events
GROUP BY
strftime('%H', time_stamp, '+30 minutes'),
current_weather_description
Run Code Online (Sandbox Code Playgroud)
下一步,按小时对上述查询的结果进行分组,并获取每小时最大事件数:
SELECT
hour,
MAX(event_count) AS max_event_count
FROM
(
SELECT
strftime('%H', time_stamp, '+30 minutes') AS hour,
current_weather_description,
current_icons,
COUNT(*) AS event_count
FROM
weather_events
GROUP BY
strftime('%H', time_stamp, '+30 minutes'),
current_weather_description
) AS s
GROUP BY
hour
Run Code Online (Sandbox Code Playgroud)
这仍然不是您想要的,因为您实际上希望描述和图标匹配最大计数,而不是计数本身.好吧,这很容易修复 - 只需将这些列添加到SELECT 而不将它们添加到GROUP BY:
SELECT
hour,
current_weather_description,
current_icons,
MAX(event_count) AS max_event_count
FROM
(
SELECT
strftime('%H', time_stamp, '+30 minutes') AS hour,
current_weather_description,
current_icons,
COUNT(*) AS event_count
FROM
weather_events
GROUP BY
strftime('%H', time_stamp, '+30 minutes'),
current_weather_description
) AS s
GROUP BY
hour
Run Code Online (Sandbox Code Playgroud)
您仍然需要保持MAX(event_count)
查询中的技巧才能工作.它起作用的原因是因为在SQLite中,当SELECT语句包含单个MAX或单个MIN调用时,既不在GROUP BY中也不在聚合中的任何所选列的值将从与所述MAX或MIN值匹配的行中获取.SQLite 3.7.11的发行说明中记录了 SQL的这种非标准扩展.
非常适合desc_table
.至于avg_temp_table
CTE,你的当前方法似乎没有任何问题,除了我可能会使用GROUP BY表达式作为小时定义而不是你正在使用的CASE表达式,以保持一致性,并且time_stamp
结果似乎也是多余的.所以稍微修改过的CTE看起来像这样:
SELECT
strftime('%H', time_stamp, '+30 minutes') AS hour,
AVG(current_temperatures) AS avg_temp
FROM
weather_events
GROUP BY
strftime('%H', time_stamp, '+30 minutes')
Run Code Online (Sandbox Code Playgroud)
现在,您只需要在列上连接两个集合,hour
并为最终输出选择相关列:
SELECT
t.avg_temp,
d.current_weather_description,
d.current_icons
FROM
avg_temp_table AS t
INNER JOIN desc_table AS d on t.hour = d.hour
ORDER BY
t.hour
Run Code Online (Sandbox Code Playgroud)
所以你在这里.现在我想解决一个关于结果查询的问题,即
虽然您采用解决方案 - 分别获取描述和平均温度然后将两组连接在一起 - 很简单并且非常有意义,但是避免连接并同时进行所有计算会很好.这很可能会使查询更快,因为源只扫描一次.这可以实现吗?
碰巧,是的,它可以.组合这两个部分的主要困难在于,描述是通过两个步骤获得的,而平均温度的计算是单步操作.简单地放入AVG(current_temperatures)
第一个CTE的嵌套SELECT(按小时和描述分组),然后对外部SELECT(按小时分组)的结果进行AVG在数学上不等同于在整个小时组中执行AVG一次.
相反,你需要记住的是AVG = SUM/COUNT.如果在第一步中获得SUM和COUNT,然后在第二步中获得SUM和SUM的COUNT,则可以将第一个外部SUM除以第二个外部SUM以获得平均值.
这是desc_table
修改后的新CTE,它将查询的两个部分组合在一起(因此它不再是CTE而是完整的查询),并以粗体突出显示必要的更改:
SELECT
SUM(total_temp) / SUM(event_count) AS avg_temp,
current_weather_description,
current_icons,
MAX(event_count) AS max_event_count
FROM
(
SELECT
strftime('%H', time_stamp, '+30 minutes') AS hour,
current_weather_description,
current_icons,
COUNT(*) AS event_count,
SUM(current_temperatures) AS total_temp
FROM
weather_events
GROUP BY
strftime('%H', time_stamp, '+30 minutes'),
current_weather_description
) AS s
GROUP BY
hour
ORDER BY
hour
;
Run Code Online (Sandbox Code Playgroud)
显然,该max_event_count
列对于输出是多余的 - 并且对于查询所依赖的"每组最大N"方法仍然是至关重要的.就个人而言,在这种情况下我不会担心一个冗余列,但是如果你有充分的理由将它排除在结果集之外,你可以使用上面的查询作为派生表(是的,再次)并拥有最外层的SELECT拉所有列除外max_event_count
- 例如,像这样:
SELECT
avg_temp,
current_weather_description,
current_icons
FROM
(
SELECT
hour,
SUM(total_temp) / SUM(event_count) AS avg_temp,
current_weather_description,
current_icons,
MAX(event_count) AS max_event_count
FROM
(
SELECT
strftime('%H', time_stamp, '+30 minutes') AS hour,
current_weather_description,
current_icons,
COUNT(*) AS event_count,
SUM(current_temperatures) AS total_temp
FROM
weather_events
GROUP BY
strftime('%H', time_stamp, '+30 minutes'),
current_weather_description
) AS s
GROUP BY
hour
) AS s
ORDER BY
hour desc
;
Run Code Online (Sandbox Code Playgroud)
正如您所看到的,中间层SELECT现在也包括在内hour
,这是最外层的ORDER BY所需要的.(我在这里假设订单对于调用应用程序很重要.)
我只能提到两种方法的结果之间的差异.在第一个中,AVG(current_temperatures)
给出一个浮点结果.在第二个中,SUM(total_temp) / SUM(event_count)
给出一个整数.由于您的预期结果显示整数平均值,我想这应该不是问题.但是,如果你以后决定你想为你的平均值更精确,只需记住,你可以在更换被SUM函数SUM(total_temp)
或SUM(current_temperatures)
与TOTAL函数返回相同的值SUM但结果始终是一个real
.划分real
由一个integer
收益率real
在SQLite的,所以使用TOTAL你会得到相同的结果与AVG在第一种方法.
归档时间: |
|
查看次数: |
424 次 |
最近记录: |