Ves*_*nen 8 firebase google-bigquery firebase-analytics
导出到 Big Query 的 Firebase 分析事件中似乎有 1-2% 的重复项。删除这些的最佳做法是什么?
Atm 客户端不会发送带有事件的计数器(每个会话)。这将提供一种明确的删除重复事件的方法,因此我建议 Firebase 实现它。但是,目前,删除重复项的好方法是什么?查看客户端 user_pseudo_id、event_timestamp 和 event_name - 字段并删除除具有相同三元组之外的所有字段?
event_bundle_sequence_id 字段如何工作?重复项在该字段中具有相同的值还是不同的值?也就是说,重复的事件是在同一个包中还是在不同的包中发送?
Firebase 是否计划在处理早期删除这些重复项,无论是针对 Firebase 分析本身,还是在导出到 Big Query 时?
用于在一天事件中检查重复项的标准 SQL:
with n_dups as
(
SELECT event_name, event_timestamp, user_pseudo_id, count(1)-1 as n_duplicates
FROM `project.dataset.events_20190610`
group by event_name, event_timestamp, user_pseudo_id
)
select n_duplicates, count(1) as n_cases
from n_dups
group by n_duplicates
order by n_cases desc
Run Code Online (Sandbox Code Playgroud)
我们QUALIFY
在 BigQuery 中使用该子句对 Firebase 事件进行重复数据删除:
SELECT
*
FROM
`project.dataset.events_*`
QUALIFY
ROW_NUMBER() OVER (
PARTITION BY
user_pseudo_id,
event_name,
event_timestamp,
TO_JSON_STRING(event_params)
) = 1
Run Code Online (Sandbox Code Playgroud)
合格列:
- name: user_pseudo_id
description: Autogenerated pseudonymous ID for the user -
Unique identifier for a specific installation of application on a client device,
e.g. "938642951.1666427135".
All events generated by that device will be tagged with this pseudonymous ID,
so that you can relate events from the same user together.
- name: event_name
description: Event name, e.g. "app_launch", "session_start", "login", "logout" etc.
- name: event_timestamp
description: The time (in microseconds, UTC) at which the event was logged on the client,
e.g. "1666529002225262".
- name: event_params
description: A repeated record (ARRAY) of the parameters associated with this event.
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
393 次 |
最近记录: |