BIGQUERY:如何查询滚动的每月用户活跃/流失情况

Liv*_*ire 2 sql timestamp google-bigquery

所以我有一个包含新闻文章的网站,我试图计算每个月的 4 种用户类型。用户类型有:

1、新用户:当月注册(首次浏览文章)并当月浏览过文章的用户。

2. 留存用户:上个月的新用户或上个月和当月浏览过文章的用户。

3. 流失用户:上个月未查看过文章的新用户或保留用户,或者上个月流失的用户。

4. 复活用户:上个月流失的用户,当月浏览过一篇文章。

**User Table A - Unique User Article Views**
- Current month = 2019-04-01 00:00:00 UTC

| user_id    | viewed_at                 |
------------------------------------------
| 4          | 2019-04-01 00:00:00 UTC   |
| 3          | 2019-04-01 00:00:00 UTC   |
| 2          | 2019-04-01 00:00:00 UTC   |
| 1          | 2019-03-01 00:00:00 UTC   |
| 3          | 2019-03-01 00:00:00 UTC   |
| 2          | 2019-02-01 00:00:00 UTC   |
| 1          | 2019-02-01 00:00:00 UTC   |
| 1          | 2019-01-01 00:00:00 UTC   |


The table above outlines the following user types:

2019-01-01
* User 1: New

2019-02-01
* User 1: Retained
* User 2: New

2019-03-01
* User 1: Retained
* User 2: Churned
* User 3: New

2019-04-01
* User 1: Churned
* User 2: Resurrected
* User 3: Retained
* User 4: New

Run Code Online (Sandbox Code Playgroud)

我想要的表计算每个月每种用户类型的不同 user_id 。

| month_viewed_at           | ut_new | ut_retained | ut_churned | ut_resurrected
------------------------------------------------------------------------------------
| 2019-04-01 00:00:00 UTC   | 1      | 1           | 1          | 1
| 2019-03-01 00:00:00 UTC   | 1      | 1           | 1          | 0
| 2019-02-01 00:00:00 UTC   | 1      | 1           | 0          | 0
| 2019-01-01 00:00:00 UTC   | 1      | 0           | 0          | 0 

Run Code Online (Sandbox Code Playgroud)

Mik*_*ant 5

我只是不知道从哪里开始

希望您阅读我所有的评论并亲自尝试一些东西,但由于我没有看到任何更新,我想您仍然停留在这里 - 所以我们开始......

以下是 BigQuery 标准 SQL,应该可以为您提供指导

#standardSQL
WITH temp1 AS (
  SELECT user_id,
    FORMAT_DATE('%Y-%m', DATE(viewed_at)) month_viewed_at, 
    DATE_DIFF(DATE(viewed_at), '2000-01-01', MONTH) pos,
    DATE_DIFF(DATE(MIN(viewed_at) OVER(PARTITION BY user_id)), '2000-01-01', MONTH) first_pos
  FROM `project.dataset.table`
), temp2 AS (
  SELECT *, pos = first_pos AS new_user
  FROM temp1
), temp3 AS (
  SELECT *, LAST_VALUE(new_user) OVER(win) OR pos - 1 = LAST_VALUE(pos) OVER(win) AS retained_user
  FROM temp2
  WINDOW win AS (PARTITION BY user_id ORDER BY pos RANGE BETWEEN 1 PRECEDING AND 1 PRECEDING)
)
SELECT month_viewed_at,
  COUNTIF(new_user) AS new_users,
  COUNTIF(retained_user) AS retained_users
FROM temp3
GROUP BY month_viewed_at
-- ORDER BY month_viewed_at DESC
Run Code Online (Sandbox Code Playgroud)

如果应用到您的样本数据 - 结果是

Row month_viewed_at new_users   retained_users   
1   2019-04         1           1    
2   2019-03         1           1    
3   2019-02         1           1    
4   2019-01         1           0    
Run Code Online (Sandbox Code Playgroud)

temp1我们通过将viewed_at格式化为在输出广告中呈现所需的格式来准备数据时,我们还将其转换为自一些抽象数据(2000-02-02)以来的连续月份数,因此我们可以使用具有RANGE而不是ROWS的分析
功能temp2我们只是简单地识别新用户和temp3保留用户

我想,这是一个好的开始,所以我把剩下的留给你