con*_*lee 13 sql google-bigquery
这个问题不是要解决特定的问题,而是要了解用于平整数组的通用SQL习语中幕后实际发生的情况。幕后有一些魔术,我想在语法糖的幕后窥视一下,看看发生了什么。
让我们考虑下表t1:
现在假设我们有一个函数调用FLATTEN了一个类型为array的列,并对该列中的每个数组进行解包,以便为每个数组中的每个值留一行-如果运行SELECT FLATTEN(numbers_array) AS flattened_numbers FROM t1,我们期望以下,我们称之为t2
在SQL中,CROSS JOIN通过将第一个表中的每一行与第二个表中的每一行进行组合来组合两个表中的行。所以如果我们跑步SELECT id, flattened.flattened_numbers from t1 CROSS JOIN flattened,我们得到
现在,flatten只是一个虚构的函数,您可以看到将其与CROSS JOIN结合起来并不是很有用,因为该id列的每个原始值都与flattened_numbers每个原始行混合在一起。因为我们没有一个WHERE子句只选择CROSS JOIN想要的行,所以一切都变得混乱了。
该模式中,人们实际上使用扁平化阵列看起来像这样:
SELECT id, flattened_numbers FROM t1 CROSS JOIN UNNEST(sequences.some_numbers) AS flattened_numbers,产生
但我不明白该CROSS JOIN UNNEST模式为何有效。因为CROSS JOIN不包含WHERE子句,所以我希望它的行为就像FLATTEN我上面概述的函数一样,其中每个未嵌套的值都与的每一行合并t1。
有人可以“解包” CROSS JOIN UNNEST模式中实际发生的情况吗,该模式可确保每行仅与其自身的嵌套值(而不与其他行的嵌套值)结合在一起?
Ell*_*ard 11
The best way to think about this is by looking at what happens on a row-by-row basis. Setting up some input data, we have:
WITH t1 AS (
SELECT 1 AS id, [0, 1] AS numbers_array UNION ALL
SELECT 2, [2, 4, 5]
)
...
Run Code Online (Sandbox Code Playgroud)
(I'm using a third element for the second row to make things more interesting). If we just select from it, we get output that looks like this:
WITH t1 AS (
SELECT 1 AS id, [0, 1] AS numbers_array UNION ALL
SELECT 2, [2, 4, 5]
)
SELECT * FROM t1;
+----+---------------+
| id | numbers_array |
+----+---------------+
| 1 | [0, 1] |
| 2 | [2, 4, 5] |
+----+---------------+
Run Code Online (Sandbox Code Playgroud)
Now let's talk about unnesting. The UNNEST function takes an array and returns a value table of the array's element type. Whereas most BigQuery tables are SQL tables defined as a collection of columns, a value table has rows of some value type. For numbers_array, UNNEST(numbers_array) returns a value table whose value type is INT64, since numbers_array is an array with an element type of INT64. This value table contains all of the elements in numbers_array for the current row from t1.
For the row with an id of 1, the contents of the value table returned by UNNEST(numbers_array) are:
+-----+
| f0_ |
+-----+
| 0 |
| 1 |
+-----+
Run Code Online (Sandbox Code Playgroud)
This is the same as what we get with the following query:
SELECT * FROM UNNEST([0, 1]);
Run Code Online (Sandbox Code Playgroud)
UNNEST([0, 1]) in this case means "create a value table from the INT64 values 0 and 1".
Similarly, for the row with an id of 2, the contents of the value table returned by UNNEST(numbers_array) are:
+-----+
| f0_ |
+-----+
| 2 |
| 4 |
| 5 |
+-----+
Run Code Online (Sandbox Code Playgroud)
Now let's talk about how CROSS JOIN fits into the picture. In most cases, you use CROSS JOIN between two uncorrelated tables. In other words, the contents of the table on the right of the CROSS JOIN are not defined by the current contents of the table on the left.
In the case of arrays and UNNEST, however, the contents of the value table produced by UNNEST(numbers_array) change depending on the current row of t1. When we join the two tables, we get the cross product of the current row from t1 with all of the rows from UNNEST(numbers_array). For example:
WITH t1 AS (
SELECT 1 AS id, [0, 1] AS numbers_array UNION ALL
SELECT 2, [2, 4, 5]
)
SELECT id, number
FROM t1
CROSS JOIN UNNEST(numbers_array) AS number;
+----+--------+
| id | number |
+----+--------+
| 1 | 0 |
| 1 | 1 |
| 2 | 2 |
| 2 | 4 |
| 2 | 5 |
+----+--------+
Run Code Online (Sandbox Code Playgroud)
numbers_array has two elements in the first row and three elements in the second, so we get 2 + 3 = 5 rows in the result of the query.
要回答有关将其展平numbers_array与然后执行的区别CROSS JOIN,我们来看一下此查询的结果:
WITH t1 AS (
SELECT 1 AS id, [0, 1] AS numbers_array UNION ALL
SELECT 2, [2, 4, 5]
), t2 AS (
SELECT number
FROM t1
CROSS JOIN UNNEST(numbers_array) AS number
)
SELECT number
FROM t2;
+--------+
| number |
+--------+
| 0 |
| 1 |
| 2 |
| 4 |
| 5 |
+--------+
Run Code Online (Sandbox Code Playgroud)
在这种情况下,t2是一个SQL表,其中包含以number这些值命名的列。如果执行CROSS JOIN介于t1和之间,则将t2得到所有行的真实叉积:
WITH t1 AS (
SELECT 1 AS id, [0, 1] AS numbers_array UNION ALL
SELECT 2, [2, 4, 5]
), t2 AS (
SELECT number
FROM t1
CROSS JOIN UNNEST(numbers_array) AS number
)
SELECT id, numbers_array, number
FROM t1
CROSS JOIN t2;
+----+---------------+--------+
| id | numbers_array | number |
+----+---------------+--------+
| 1 | [0, 1] | 0 |
| 1 | [0, 1] | 1 |
| 1 | [0, 1] | 2 |
| 1 | [0, 1] | 4 |
| 1 | [0, 1] | 5 |
| 2 | [2, 4, 5] | 0 |
| 2 | [2, 4, 5] | 1 |
| 2 | [2, 4, 5] | 2 |
| 2 | [2, 4, 5] | 4 |
| 2 | [2, 4, 5] | 5 |
+----+---------------+--------+
Run Code Online (Sandbox Code Playgroud)
那么,此查询与上一个查询之间有什么区别CROSS JOIN UNNEST(numbers_array)?在这种情况下,t2从的每一行的内容都不会改变t1。对于中的第一行t1,中有五行t2。对于第二行t1,在中有五行t2。结果,CROSS JOIN它们两个之间的5 + 5 = 10总计返回行。