如何使用 BigQuery 计算 DAU/MAU(参与度)

Fel*_*ffa 2 google-bigquery

DAU 和 MAU(每日活跃用户和每月活跃用户)是衡量用户参与度的既定方法。

如何使用 SQL 和 Google BigQuery 获取这些数字?

Fel*_*ffa 5

2019年标准SQL更新:


(要了解 DAU/MAU 的实用性,请参阅http://blog.compariscope.wefi.com/mobile-app-usage-dau-mau等文章)

让我们使用 BigQuery 中存储的 reddit 评论数据。我们想要找出 9 月份 Reddit 子版块“AskReddit”的每日活跃用户/活跃用户比率:

SELECT day, dau, mau, INTEGER(100*dau/mau) daumau
FROM (
  SELECT day, EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
  FROM (
    SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) day, author
    FROM [fh-bigquery:reddit_comments.2015_09]
    WHERE subreddit='AskReddit') a
  JOIN (
    SELECT stopday, EXACT_COUNT_DISTINCT(author) mau
    FROM (SELECT created_utc, subreddit, author FROM [fh-bigquery:reddit_comments.2015_09], [fh-bigquery:reddit_comments.2015_08]) a
    CROSS JOIN (
      SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) stopday
      FROM [fh-bigquery:reddit_comments.2015_09]
      GROUP BY 1
    ) b
    WHERE subreddit='AskReddit'
    AND SEC_TO_TIMESTAMP(created_utc) BETWEEN DATE_ADD(stopday, -30, 'day') AND TIMESTAMP(stopday)
    GROUP BY 1
  ) b
  ON a.day=b.stopday
  GROUP BY 1
)
ORDER BY 1
Run Code Online (Sandbox Code Playgroud)

此查询获取 9 月份每天的 DAU,并查看 8 月份的数据以获取以每个 DAU 日结束的每 30 天期间的 MAU。这需要大量处理 (30x),如果我们只计算 9 月份的一个 MAU,并继续使用该值作为分母,我们可以获得几乎相同的结果:

SELECT day, dau, mau, INTEGER(100*dau/mau) daumau
FROM (
  SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) day, EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
  FROM [fh-bigquery:reddit_comments.2015_09] a
  CROSS JOIN (
    SELECT EXACT_COUNT_DISTINCT(author) mau
    FROM [fh-bigquery:reddit_comments.2015_09]
    WHERE subreddit='AskReddit'
  ) b
  WHERE subreddit='AskReddit'
  GROUP BY 1
)
ORDER BY 1
Run Code Online (Sandbox Code Playgroud)

这是一个更简单的查询,可以更快地为我们带来几乎相同的结果。

现在获取本月该 Reddit 子版块的平均值:

SELECT ROUND(100*AVG(dau/mau), 2) daumau
FROM (
  SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) day, EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
  FROM [fh-bigquery:reddit_comments.2015_09] a
  CROSS JOIN (
    SELECT EXACT_COUNT_DISTINCT(author) mau
    FROM [fh-bigquery:reddit_comments.2015_09]
    WHERE subreddit='AskReddit'
  ) b
  WHERE subreddit='AskReddit'
  GROUP BY 1
)
Run Code Online (Sandbox Code Playgroud)

这告诉我们,“AskReddit”在 9 月份的参与度为 8.95%。

最后一站,如何比较各个 Reddit 子版块的参与度:

SELECT ROUND(100*AVG(dau)/MAX(mau), 2) avg_daumau, MAX(mau) mau, subreddit
FROM (
  SELECT a.subreddit, DATE(SEC_TO_TIMESTAMP(created_utc)) day,
         EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
  FROM [fh-bigquery:reddit_comments.2015_09] a
  JOIN (
    SELECT EXACT_COUNT_DISTINCT(author) mau, subreddit
    FROM [fh-bigquery:reddit_comments.2015_09]
    GROUP BY 2
  ) b
  ON a.subreddit=b.subreddit
  WHERE mau>50000
  GROUP BY 1, 2
)

GROUP BY subreddit
ORDER BY 1
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述