Postgres版本9.4.18,PostGIS版本2.2.
以下是我正在使用的表(并且不太可能对表结构进行重大更改):
表ltg_data(跨越1988年至2018年):
Column | Type | Modifiers
----------+--------------------------+-----------
intensity | integer | not null
time | timestamp with time zone | not null
lon | numeric(9,6) | not null
lat | numeric(8,6) | not null
ltg_geom | geometry(Point,4269) |
Indexes:
"ltg_data2_ltg_geom_idx" gist (ltg_geom)
"ltg_data2_time_idx" btree ("time")
Size of ltg_data (~800M rows):
ltg=# select pg_relation_size('ltg_data');
pg_relation_size
------------------
149729288192
Run Code Online (Sandbox Code Playgroud)
表县:
Column | Type | Modifiers
-----------+-----------------------------+--------------------------------- -----------------------
gid | integer | not null default
nextval('counties_gid_seq'::regclass)
objectid_1 | integer |
objectid | integer |
state | character varying(2) |
cwa | character varying(9) |
countyname | character varying(24) |
fips | character varying(5) |
time_zone | character varying(2) |
fe_area | character varying(2) |
lon | double precision |
lat | double precision |
the_geom | geometry(MultiPolygon,4269) |
Indexes:
"counties_pkey" PRIMARY KEY, btree (gid)
"counties_gix" gist (the_geom)
"county_cwa_idx" btree (cwa)
"countyname_cwa_idx" btree (countyname)
Run Code Online (Sandbox Code Playgroud)
我有一个查询,计算跨越30年的每年(月 - 日)每天的总行数.在Stackoverflow的帮助下,获取这些计数的查询工作正常.这是查询和结果,使用以下函数.
功能:
CREATE FUNCTION f_mmdd(date) RETURNS int LANGUAGE sql IMMUTABLE AS
$$SELECT to_char($1, 'MMDD')::int$$;
Run Code Online (Sandbox Code Playgroud)
查询:
SELECT d.mmdd, COALESCE(ct.ct, 0) AS total_count
FROM (
SELECT f_mmdd(d::date) AS mmdd -- ignoring the year
FROM generate_series(timestamp '2018-01-01' -- any dummy year
, timestamp '2018-12-31'
, interval '1 day') d
) d
LEFT JOIN (
SELECT f_mmdd(time::date) AS mmdd, count(*) AS ct
FROM counties c
JOIN ltg_data d ON ST_contains(c.the_geom, d.ltg_geom)
WHERE cwa = 'MFR'
GROUP BY 1
) ct USING (mmdd)
ORDER BY 1;
Run Code Online (Sandbox Code Playgroud)
结果:
mmdd total_count
725 | 2126
726 | 558
727 | 2
728 | 2
729 | 2
730 | 0
731 | 0
801 | 0
802 | 10
Run Code Online (Sandbox Code Playgroud)
期望的结果:我正试图找到关于一年中几天的计数的其他统计信息.举例来说,我知道7月25日(下表725),超过多年是在表中的总数是2126.什么我要找的是7月25日(725),百分比最高每日计数几年那一天不为零,最小,百分比年,在那里COUNT(*)不为零,百分位数(第10百分位,第25百分位,第50百分位,第75百分位,第90百分位,和STDEV将是有益的太).很高兴看到max_daily发生在哪一年.我想如果这一年中没有任何计数,那么year_max_daily将为空或零.
mmdd total_count max daily year_max_daily percent_years_count_not_zero 10th percentile_daily 90th percentile_daily
725 | 2126 1000 1990 30 15 900
726 | 558 120 1992 20 10 80
727 | 2 1 1991 2 0 1
728 | 2 1 1990 2 0 1
729 | 2 1 1989 2 0 1
730 | 0 0 0 0 0
731 | 0 0 0 0 0
801 | 0 0 0 0 0
802 | 10 10 1990 0 1 8
Run Code Online (Sandbox Code Playgroud)
到目前为止我所尝试的只是不起作用.它返回与total相同的结果.我认为这是因为我只是想在计算总数之后得到一个平均值,所以我并没有真正关注每年每一天的计数并找到平均值.
尝试:
SELECT AVG(CAST(total_count as FLOAT)), day
FROM
(
SELECT d.mmdd as day, COALESCE(ct.ct, 0) as total_count
FROM (
SELECT f_mmdd(d::date) AS mmdd
FROM generate_series(timestamp '2018-01-01', timestamp '2018-12-31', interval '1 day') d
) d
LEFT JOIN (
SELECT mmdd, avg(q.ct) FROM (
SELECT f_mmdd((time at time zone 'utc+12')::date) as mmdd, count(*) as ct
FROM counties c
JOIN ltg_data d on ST_contains(c.the_geom, d.ltg_geom)
WHERE cwa = 'MFR'
GROUP BY 1
)
) as q
ct USING (mmdd)
ORDER BY 1
Run Code Online (Sandbox Code Playgroud)
谢谢你的帮助!
我没有包括所有请求统计数据的计算 - 在一个问题中有太多,但我希望您能够扩展下面的查询并添加您需要的额外统计数据.
我正在使用下面的CTE使查询可读.如果你愿意,你可以把它全部放在一个巨大的查询中.我建议逐步运行查询,CTE-by-CTE并检查中间结果以了解其工作原理.
CTE_Dates 是30年所有可能日期的简单列表.
CTE_DailyCounts 是30年来每天的基本计数列表(我已经采用了您现有的查询).
CTE_FullStats再次列出所有日期以及使用窗口函数计算每个(月,日)的一些统计数据,按月,日分区.ROW_NUMBER过去常常得到每年计数最多的日期.
最终查询仅选择具有该年度最大计数的一行以及其余信息.
我没有尝试运行查询,因为问题没有样本数据,因此可能存在一些错别字.
WITH
CTE_Dates
AS
(
SELECT
d::date AS dt
,EXTRACT(MONTH FROM d::date) AS dtMonth
,EXTRACT(DAY FROM d::date) AS dtDay
,EXTRACT(YEAR FROM d::date) AS dtYear
FROM
generate_series(timestamp '1988-01-01', timestamp '2018-12-31', interval '1 day') AS d
-- full range of possible dates
)
,CTE_DailyCounts
AS
(
SELECT
time::date AS dt
,count(*) AS ct
FROM
counties c
INNER JOIN ltg_data d ON ST_contains(c.the_geom, d.ltg_geom)
WHERE cwa = 'MFR'
GROUP BY time::date
)
,CTE_FullStats
AS
(
SELECT
CTE_Dates.dt
,CTE_Dates.dtMonth
,CTE_Dates.dtDay
,CTE_Dates.dtYear
,CTE_DailyCounts.ct
,SUM(CTE_DailyCounts.ct) OVER (PARTITION BY dtMonth, dtDay) AS total_count
,MAX(CTE_DailyCounts.ct) OVER (PARTITION BY dtMonth, dtDay) AS max_daily
,SUM(CASE WHEN CTE_DailyCounts.ct > 0 THEN 1 ELSE 0 END) OVER (PARTITION BY dtMonth, dtDay) AS nonzero_day_count
,COUNT(*) OVER (PARTITION BY dtMonth, dtDay) AS years_count
,100.0 * SUM(CASE WHEN CTE_DailyCounts.ct > 0 THEN 1 ELSE 0 END) OVER (PARTITION BY dtMonth, dtDay)
/ COUNT(*) OVER (PARTITION BY dtMonth, dtDay) AS percent_years_count_not_zero
,ROW_NUMBER() OVER (PARTITION BY dtMonth, dtDay ORDER BY CTE_DailyCounts.ct DESC) AS rn
FROM
CTE_Dates
LEFT JOIN CTE_DailyCounts ON CTE_DailyCounts.dt = CTE_Dates.dt
)
SELECT
dtMonth
,dtDay
,total_count
,max_daily
,dtYear AS year_max_daily
,percent_years_count_not_zero
FROM
CTE_FullStats
WHERE
rn = 1
ORDER BY
dtMonth
,dtDay
;
Run Code Online (Sandbox Code Playgroud)