Yan*_*ike 1 sql arrays count google-bigquery
我得到下表和下面的查询:

SELECT
fullVisitorId,
COUNT(fullVisitorId) as id_count,
ARRAY_AGG(trafficSource.medium) AS trafic_medium
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170101`
GROUP BY
fullVisitorId
ORDER BY
id_count DESC
Run Code Online (Sandbox Code Playgroud)
对于列中的每个值trafic_medium(例如:cpc、推荐、有机等),我试图计算出每个值在数组中出现的频率,因此最好添加一个新列“计数”来显示该值的出现频率发生了?
+-----------+---------+------+
| array_agg | medium | count|
+-----------+---------+------+
| 123 | cpc | 2 |
+-----------+---------+------+
| | organic | 1 |
+-----------+---------+------+
| | cpc | 2 |
+-----------+---------+------+
| 456 | organic | 2 |
+-----------+---------+------+
| | organic | 2 |
+-----------+---------+------+
| | cpc | 1 |
+-----------+---------+------+
Run Code Online (Sandbox Code Playgroud)
我是 SQL 新手,所以我很困惑。
到目前为止我尝试过:
WITH medium AS
(
SELECT
fullVisitorId,
COUNT(fullVisitorId) as id_count,
ARRAY_AGG(trafficSource.medium) AS trafic_medium
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170101`
GROUP BY
fullVisitorId
ORDER BY
id_count DESC
)
SELECT
fullVisitorId,
trafic_medium,
(SELECT AS STRUCT Any_Value(trafic_medium) AS name, COUNT(*) AS count
FROM
UNNEST(trafic_medium) AS trafic_medium) AS trafic_medium_2,
FROM
medium
Run Code Online (Sandbox Code Playgroud)
基于此线程: How to count of elements in a bigquery array field
然而,这仅显示“Any_Value”的数量,而不是所有不同的。
我将不胜感激一些帮助!
ps 我正在 BigQuery 中的“bigquery-public-dataset.google_analytics_sample”上执行此操作
以下是 BigQuery 标准 SQL 的内容,可帮助您入门
#standardSQL
SELECT id, trafic_medium,
ARRAY(
SELECT AS STRUCT medium, COUNT(1) `count`
FROM t.trafic_medium medium
GROUP BY medium
) stats
FROM `project.dataset.table` t
Run Code Online (Sandbox Code Playgroud)
是否适用于您问题中的样本/虚拟数据,如下例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT 123 id, ['cpc', 'organic', 'cpc'] trafic_medium UNION ALL
SELECT 456, ['organic', 'organic', 'cpc']
)
SELECT id, trafic_medium,
ARRAY(
SELECT AS STRUCT medium, COUNT(1) `count`
FROM t.trafic_medium medium
GROUP BY medium
) stats
FROM `project.dataset.table` t
-- ORDER BY id
Run Code Online (Sandbox Code Playgroud)
结果将是
作为一个选项 - 您可以使用以下版本
#standardSQL
SELECT id,
ARRAY(
SELECT AS STRUCT medium, `count`
FROM t.trafic_medium medium
LEFT JOIN (
SELECT AS STRUCT medium, COUNT(1) `count`
FROM t.trafic_medium medium
GROUP BY medium
) stats
USING(medium)
) trafic_medium
FROM `project.dataset.table` t
-- ORDER BY id
Run Code Online (Sandbox Code Playgroud)
(如果适用于相同的虚拟数据)将输出如下
该版本看起来更符合您的预期结果