Firebase导出到BigQuery:保留群组查询

Geo*_*rge 10 firebase google-bigquery firebase-analytics

Firebase通过Firebase远程配置提供拆分测试功能,但是缺乏在具有用户属性的群组部分中过滤保留的能力(实际上具有任何属性).

为了解决这个问题,我正在寻找BigQuery,因为Firebase Analytics提供了将数据导出到此服务的可用方法.

但我坚持许多问题,谷歌没有答案或例子可能指向我正确的方向.

一般的问题:

作为第一步,我需要聚合代表相同数据firebase队列的数据,所以我可以确定我的计算是正确的:

firebase队列

下一步应该只是对查询应用约束,因此它们匹配自定义用户属性.

到目前为止我得到的是:

在此输入图像描述

主要问题 - 用户计算的巨大差异.有时大约有100个用户,但有时接近1000个用户.

这是我使用的方法:

# 1

# Count users with `user_dim.first_open_timestamp_micros` 
# in specified period (w0 – week 1)
# this is the way firebase group users to cohorts 
# (who started app on the same day or during the same week) 
# https://support.google.com/firebase/answer/6317510

SELECT
  COUNT(DISTINCT user_dim.app_info.app_instance_id) as count
FROM
  (
   TABLE_DATE_RANGE
    (
     [admob-app-id-xx:xx_IOS.app_events_], 
     TIMESTAMP('2016-11-20'), 
     TIMESTAMP('2016-11-26')
    )
  )
WHERE
  STRFTIME_UTC_USEC(user_dim.first_open_timestamp_micros, '%Y-%m-%d')
  BETWEEN '2016-11-20' AND '2016-11-26'

# 2

# For each next period count events with 
# same first_open_timestamp
# Here is example for one of the weeks. 
# week 0 is Nov20-Nov26, week 1 is Nov27-Dec03

SELECT
  COUNT(DISTINCT user_dim.app_info.app_instance_id) as count
FROM
  (
   TABLE_DATE_RANGE
    (
     [admob-app-id-xx:xx_IOS.app_events_], 
     TIMESTAMP('2016-11-27'), 
     TIMESTAMP('2016-12-03')
    )
  )
WHERE
  STRFTIME_UTC_USEC(user_dim.first_open_timestamp_micros, '%Y-%m-%d')
  BETWEEN '2016-11-20' AND '2016-11-26'

# 3

# Now we have users for each week w1, w2, ... w5
# Calculate retention for each of them
# retention week 1 = w1 / w0 * 100 = 25.72181359
# rw2 = w2 / w1 * 100
# ...
# rw5 = w5 / w1 * 100

# 4 

# Shift week 0 by one and repeat from step 1
Run Code Online (Sandbox Code Playgroud)

BigQuery查询提示请求

任何有关构建复杂查询的提示和指示都可以在一个步骤中汇总和计算此任务所需的所有数据,我们非常感谢.

如果需要,这是BigQuery Export架构

附带问题:

  • 为什么所有的user_dim.device_info.device_iduser_dim.device_info.resettable_device_idnull
  • user_dim.app_info.app_id 文档中缺少(如果firebase支持队友将会阅读此问题)
  • 如何event_dim.timestamp_microsevent_dim.previous_timestamp_micros应该使用,我无法达到他们的目的.

PS

来自Firebase队友的人会回答这个问题.五个月前,有人提到过将群组功能扩展到过滤或显示大量查询示例,但事情并没有发生.他们说,Firebase Analytics是他们所说的,谷歌分析已被弃用.现在,我花了第二天精益求精,并在现有的分析工具上构建自己的解决方案.我没有,堆栈溢出不是这个评论的地方,但是你在想什么?拆分测试可能会在语法上影响我的应用的保留.我的应用程序没有出售任何东西,漏斗和事件在许多情况下都不是有价值的指标.

Mik*_*ant 13

任何有关构建复杂查询的提示和指示都可以在一个步骤中汇总和计算此任务所需的所有数据,我们非常感谢.

是的,通用bigquery将正常工作

下面不是最通用的版本,但可以给你一个想法
在这个例子中我使用的是Google BigQuery Public Datasets中提供的Stack Overflow Data

第一个子选择 - 活动 - 在大多数情况下,只需要重新编写以反映数据细节.
它的作用是:
a.定义要为分析设置的时间段.
在下面的例子中 - 它是一个月 - FORMAT_DATE('%Y-%m',...
但你可以分别使用年,周,日或其他任何东西 -
按年 - FORMAT_DATE('%Y',DATE(答案) .creation_date))AS期间
•按周 - FORMAT_DATE('%Y-%W',DATE(answers.creation_date))AS期间
•按天 - FORMAT_DATE('%Y-%m-%d',DATE(答案. creation_date))AS期间
•...
b.它也"过滤"你需要分析的事件/活动的类型,
例如,`WHERE CONCAT('|',questions.tags,'|')LIKE'%| google- bigquery |%'为google-bigquery标记的问题寻找答案

其余的子查询更加通用,大多数可以按原样使用

#standardSQL
WITH activities AS (
  SELECT answers.owner_user_id AS id,
    FORMAT_DATE('%Y-%m', DATE(answers.creation_date)) AS period
  FROM `bigquery-public-data.stackoverflow.posts_answers` AS answers
  JOIN `bigquery-public-data.stackoverflow.posts_questions` AS questions
  ON questions.id = answers.parent_id
  WHERE CONCAT('|', questions.tags, '|') LIKE '%|google-bigquery|%' 
  GROUP BY id, period
), cohorts AS (
  SELECT id, MIN(period) AS cohort FROM activities GROUP BY id
), periods AS (
  SELECT period, ROW_NUMBER() OVER(ORDER BY period) AS num
  FROM (SELECT DISTINCT cohort AS period FROM cohorts)
), cohorts_size AS (
  SELECT cohort, periods.num AS num, COUNT(DISTINCT activities.id) AS ids 
  FROM cohorts JOIN activities ON activities.period = cohorts.cohort AND cohorts.id = activities.id
  JOIN periods ON periods.period = cohorts.cohort
  GROUP BY cohort, num
), retention AS (
  SELECT cohort, activities.period AS period, periods.num AS num, COUNT(DISTINCT cohorts.id) AS ids
  FROM periods JOIN activities ON activities.period = periods.period
  JOIN cohorts ON cohorts.id = activities.id 
  GROUP BY cohort, period, num 
)
SELECT 
  CONCAT(cohorts_size.cohort, ' - ',  FORMAT("%'d", cohorts_size.ids), ' users') AS cohort, 
  retention.num - cohorts_size.num AS period_lag, 
  retention.period as period_label,
  ROUND(retention.ids / cohorts_size.ids * 100, 2) AS retention , retention.ids AS rids
FROM retention
JOIN cohorts_size ON cohorts_size.cohort = retention.cohort
WHERE cohorts_size.cohort >= FORMAT_DATE('%Y-%m', DATE('2015-01-01'))
ORDER BY cohort, period_lag, period_label  
Run Code Online (Sandbox Code Playgroud)

您可以使用您选择的工具可视化上述查询的结果
注意:您可以使用period_lag或period_label
在下面的示例中查看它们的用途差异

with period_lag

在此输入图像描述

与period_label

在此输入图像描述

  • 我会尝试整理一些东西。同时,如果您喜欢,请考虑对答案进行投票:o) (2认同)