如何在Snowflake中利用动态值

Mar*_*Roy 1 pivot dynamic snowflake-cloud-data-platform

我想基于可以包含“动态”值(并非总是事先知道)的字段来透视表。

我可以通过对值进行硬编码来使其工作(这是不希望的):

SELECT *
FROM my_table
  pivot(SUM(amount) FOR type_id IN (1,2,3,4,5,20,50,83,141,...);
Run Code Online (Sandbox Code Playgroud)

但是我无法使用查询来动态提供值:

SELECT *
FROM my_table
  pivot(SUM(amount) FOR type_id IN (SELECT id FROM types);
---
090150 (22000): Single-row subquery returns more than one row. 

SELECT *
FROM my_table
  pivot(SUM(amount) FOR type_id IN (SELECT ARRAY_AGG(id) FROM types);
---
001038 (22023): SQL compilation error:                                          
Can not convert parameter 'my_table.type_id' of type [NUMBER(38,0)] into expected type [ARRAY]
Run Code Online (Sandbox Code Playgroud)

有没有办法做到这一点?

jbm*_*jbm 5

我不认为在本机SQL中是不可能的,但是我写了一篇文章并发布了一些代码,展示了我的团队如何通过从Python生成查询来做到这一点。

您可以直接调用Python脚本,并传递类似于Excel为数据透视表提供的选项的参数:

python generate_pivot_query.py                  \
    --dbtype snowflake --database mydb          \
    --host myhost.url --port 5432               \
    --user me --password myp4ssw0rd             \
    --base-columns customer_id                  \
    --pivot-columns category                    \
    --exclude-columns order_id                  \
    --aggfunction-mappings amount=sum           \
    myschema orders
Run Code Online (Sandbox Code Playgroud)

或者,如果您是Airflow,则可以使用CreatePivotTableOperator直接创建任务。


Fel*_*ffa 5

我编写了一个 Snowflake 存储过程来获取 Snowflake 内部的动态枢轴,请检查:

3个步骤:

  1. 询问
  2. 调用存储过程call pivot_prev_results()
  3. 查找结果select * from table(result_scan(last_query_id(-2)))

步骤:

create or replace procedure pivot_prev_results()
returns string
language javascript
execute as caller as
$$
  var cols_query = `
      select '\\'' 
        || listagg(distinct pivot_column, '\\',\\'') within group (order by pivot_column)
        || '\\'' 
      from table(result_scan(last_query_id(-1)))
  `;
  var stmt1 = snowflake.createStatement({sqlText: cols_query});
  var results1 = stmt1.execute();
  results1.next();
  var col_list = results1.getColumnValue(1);
  
  pivot_query = `
         select * 
         from (select * from table(result_scan(last_query_id(-2)))) 
         pivot(max(pivot_value) for pivot_column in (${col_list}))
     `
  var stmt2 = snowflake.createStatement({sqlText: pivot_query});
  stmt2.execute();
  return `select * from table(result_scan('${stmt2.getQueryId()}'));\n  select * from table(result_scan(last_query_id(-2)));`;
$$;
Run Code Online (Sandbox Code Playgroud)