Asp*_*per 5 sql google-bigquery
我的表如下所示:
author | group
daniel | group1,group2,group3,group4,group5,group8,group10
adam | group2,group5,group11,group12
harry | group1,group10,group15,group13,group15,group18
...
...
Run Code Online (Sandbox Code Playgroud)
我希望我的输出看起来像:
author1 | author2 | intersection | union
daniel | adam | 2 | 9
daniel | harry| 2 | 11
adam | harry| 0 | 10
Run Code Online (Sandbox Code Playgroud)
谢谢你
请尝试以下操作(适用于 BigQuery)
SELECT
a.author AS author1,
b.author AS author2,
SUM(a.item=b.item) AS intersection,
EXACT_COUNT_DISTINCT(a.item) + EXACT_COUNT_DISTINCT(b.item) - intersection AS [union]
FROM FLATTEN((
SELECT author, SPLIT([group]) AS item FROM YourTable
), item) AS a
CROSS JOIN FLATTEN((
SELECT author, SPLIT([group]) AS item FROM YourTable
), item) AS b
WHERE a.author < b.author
GROUP BY 1,2
Run Code Online (Sandbox Code Playgroud)
添加了 BigQuery 标准 SQL 的解决方案
WITH YourTable AS (
SELECT 'daniel' AS author, 'group1,group2,group3,group4,group5,group8,group10' AS grp UNION ALL
SELECT 'adam' AS author, 'group2,group5,group11,group12' AS grp UNION ALL
SELECT 'harry' AS author, 'group1,group10,group13,group15,group18' AS grp
),
tempTable AS (
SELECT author, SPLIT(grp) AS grp
FROM YourTable
)
SELECT
a.author AS author1,
b.author AS author2,
(SELECT COUNT(1) FROM a.grp) AS count1,
(SELECT COUNT(1) FROM b.grp) AS count2,
(SELECT COUNT(1) FROM UNNEST(a.grp) AS agrp JOIN UNNEST(b.grp) AS bgrp ON agrp = bgrp) AS intersection_count,
(SELECT COUNT(1) FROM (SELECT * FROM UNNEST(a.grp) UNION DISTINCT SELECT * FROM UNNEST(b.grp))) AS union_count
FROM tempTable a
JOIN tempTable b
ON a.author < b.author
Run Code Online (Sandbox Code Playgroud)
我喜欢这个的原因:
当/如果尝试 -确保取消Use Legacy SQL选中下面的复选框Show Options
| 归档时间: |
|
| 查看次数: |
2271 次 |
| 最近记录: |