awk*_*987 1 sql database window-functions gaps-and-islands google-bigquery
我在网站上有一个关于用户航班预订模式的数据表。让我们假设以下数据是我拥有的关于我的用户的所有历史数据。
的session_date是,用户来了到网站和搜索特定路线的日子,而flight_date为航班起飞日期。我已经订购了这张桌子session_date。结果记录在booked.
+---------+--------------+----------------+--------------+-------------+--------+
| user_id | session_date | departure_code | arrival_code | flight_date | booked |
+---------+--------------+----------------+--------------+-------------+--------+
| user1 | 7 Jan | CA | MY | 8 Mar | 1 |
| user1 | 8 Jan | US | MY | 18 May | 0 |
| user1 | 8 Jan | US | MY | 18 May | 1 |
| user1 | 8 Jan | CA | MY | 19 Mar | 0 |
| user1 | 9 Jan | US | MY | 18 May | 1 |
+---------+--------------+----------------+--------------+-------------+--------+
Run Code Online (Sandbox Code Playgroud)
我想在我的表中输出一个名为previous_flight_date. 新列将在每次搜索时说明,之前flight_date为该特定路线预订的。即使用户多次搜索同一条路线但从未预订,此列中的值为空。
+-------+--------------+----------------+--------------+-------------+--------+----------------------+
| _id | session_date | departure_code | arrival_code | flight_date | booked | previous_flight_date |
+-------+--------------+----------------+--------------+-------------+--------+----------------------+
| user1 | 7 Jan | CA | SG | 8 Mar | 1 | null |
| user1 | 8 Jan | US | MY | 18 May | 0 | null |
| user1 | 8 Jan | US | MY | 18 May | 1 | null |
| user1 | 8 Jan | CA | SG | 19 Mar | 0 | 8 Mar |
| user1 | 2 Feb | US | MY | 2 Jul | 1 | 18 May |
+-------+--------------+----------------+--------------+-------------+--------+----------------------+
Run Code Online (Sandbox Code Playgroud)
因此,例如,该列在反映“8 Mar”的第 4 行之前将为空,因为用户已预订了从 CA-->SG 出发的当天出发的航班。
我试过使用 LAST_VALUE 但它没有用。当我有多种不同类型的路由时,我也不知道如何使用 LAG(),并且我想在某个条件下查找前一行。如果有人提出解决方案,那就太好了!谢谢你。
我认为你可以用first_value(). 诀窍是在窗口函数中放置一个条件,打开该ignore nulls选项,然后使用窗口框架规范回顾具有相同出发/到达的前一行,不包括当前行:
select
t.*,
first_value(case when booked = 1 then flight_date end ignore nulls) over(
partition by departure_code, arrival code
order by flight_date desc
rows between unbounded preceding and 1 preceding
) previous_flight_date
from mytable t
Run Code Online (Sandbox Code Playgroud)
实际上一个窗口max()也可以工作(然后,不需要ignore nulls):
select
t.*,
max(case when booked = 1 then flight_date end) over(
partition by departure_code, arrival code
order by flight_date desc
rows between unbounded preceding and 1 preceding
) previous_flight_date
from mytable t
Run Code Online (Sandbox Code Playgroud)