Don*_*ker 1 google-analytics google-bigquery
我正在使用大查询,并尝试导入自定义维度以及非自定义维度.分析是从应用程序发送的,基本上我想要一个包含列的表:UserID(自定义维度),platformID(自定义维度),ScreenName(基本上是"页面名称"的应用版本)和日期.该指标是分组到所有这些维度上的"屏幕视图数".这就是下面的内容:
GA报告的照片:
因此,在bigquery中,我可以获得签出的数字(与上面的GA报告相比),直到我添加自定义维度.一旦我添加了自定义尺寸,这些数字就没有任何意义了.
我知道自定义维度嵌套在大查询中.所以我一开始确保使用FLATTEN.然后我尝试了没有展平并得到了相同的结果.这些数字毫无意义(比GA接口大几百倍).
我的查询如下(一个没有FLATTEN,一个没有FLATTEN).
ps我理想的想用
count(hits)
Run Code Online (Sandbox Code Playgroud)
代替
count(hits.appInfo.screenName)
Run Code Online (Sandbox Code Playgroud)
但是当我在子查询中选择命中时,我一直收到错误.
我没有展平的查询如下.如果你能帮我弄清楚为什么一旦我添加自定义尺寸,所有数据都会搞砸
SELECT
date,
hits.appInfo.version,
hits.appInfo.screenName,
UserIdd,
platform,
count(hits.appInfo.screenName)
FROM (
SELECT
date,
hits.appInfo.version,
hits.appInfo.screenName,
max(case when hits.customdimensions.index = 5 then hits.customdimensions.value end) within record as UserIdd,
max(case when hits.customdimensions.index = 20 then hits.customdimensions.value end) within record as platform
FROM
TABLE_DATE_RANGE([fiery-cabinet-97820:87025718.ga_sessions_], TIMESTAMP('2017-04-04'), TIMESTAMP('2017-04-04'))
)
where UserIdd is not null
and platform = 'Android'
GROUP BY
1,
2,
3,
4,
5
ORDER BY
6 DESC
Run Code Online (Sandbox Code Playgroud)
这是我对FLATTEN的查询(同样的问题 - 数字没有意义)
SELECT
date,
hits.appInfo.version,
customDimensions.index,
customDimensions.value,
hits.appInfo.screenName,
UserIdd,
count(hits.appInfo.screenName)
FROM (FLATTEN(( FLATTEN((
SELECT
date,
hits.appInfo.version,
customDimensions.value,
customDimensions.index,
hits.appInfo.screenName,
max(case when hits.customdimensions.index = 5 then hits.customdimensions.value end) within record as UserIdd,
hits.type
FROM
TABLE_DATE_RANGE([fiery-cabinet-97820:87025718.ga_sessions_], TIMESTAMP('2017-04-04'), TIMESTAMP('2017-04-04'))), customDimensions.value)),hits.type))
WHERE
customDimensions.value = 'Android'
and customDimensions.index = 20
and UserIdd is not null
GROUP BY
1,
2,
3,
4,
5,
6
ORDER BY
7 DESC
Run Code Online (Sandbox Code Playgroud)
我并不认为hits.customDimensions.*总会有用户范围的维度(我猜你的userId指标是用户范围的).
具体而言,应该查询用户范围的维度customDimensions,而不是hits.customDimensions.
从理论上讲,第一步是customDimensions与hits.*via flattening或scoped aggregation 兼容.我将解释扁平化方法.
GA记录具有形状(customDimensions[], hits[], ...),这对查询两个字段都没有好处.我们首先将这些扁平化为(customDimensionN, hits[], ...).
升级一级,通过选择下面的字段hits.*,我们隐含地将表格展平为(customDimensionN, hitN)记录.我们过滤这些只包括匹配的记录(customDimension5, appviewN).
最后一步是计算一切.
SELECT date, v, sn, uid, COUNT(*)
FROM (
SELECT
date,
hits.appInfo.version v,
hits.appInfo.screenName sn,
customDimensions.value uid
FROM
FLATTEN((
SELECT customDimensions.*, hits.*, date
FROM
TABLE_DATE_RANGE(
[fiery-cabinet-97820:87025718.ga_sessions_],
TIMESTAMP('2017-04-04'),
TIMESTAMP('2017-04-04'))),
customDimensions)
WHERE hits.type = "APPVIEW" and customDimensions.index = 5)
GROUP BY 1,2,3,4
ORDER BY 5 DESC
Run Code Online (Sandbox Code Playgroud)
这是另一种等效方法.这使用了我在GA BQ食谱中推荐的范围聚合技巧.然而,查看查询说明,MAX(IF(...)) WITHIN RECORD似乎相当昂贵,在第一阶段触发额外COMPUTE和AGGREGATE阶段.尽管如此,奖励还是有点容易消化.
SELECT sn, uid, date, v, COUNT(*)
FROM (
SELECT
MAX(IF(customDimensions.index = 5, customDimensions.value, null)) within record as uid,
hits.appInfo.screenname as sn,
date,
hits.appInfo.version as v,
hits.type
FROM
TABLE_DATE_RANGE([fiery-cabinet-97820:87025718.ga_sessions_], TIMESTAMP('2017-04-04'), TIMESTAMP('2017-04-04')))
WHERE hits.type = "APPVIEW" and uid is not null
GROUP BY 1,2,3,4
ORDER BY 5 DESC
Run Code Online (Sandbox Code Playgroud)
我还不熟悉BQ的标准SQL方言,但似乎它会简化这种争论.如果你要做这样的许多查询,你可能想要围绕它.