将函数动态应用到 Postgres 表中的所有列

Question

将函数动态应用到 Postgres 表中的所有列

Mik*_*19x 3 sql postgresql null dynamic-sql plpgsql

使用 Postgres 13.1，我想对表的所有列应用前向填充函数。前向填充函数在我之前的问题中进行了解释：

如何将前向填充作为 PL/PGSQL 函数

但是，在这种情况下，列和表是指定的。我想获取该代码并将其应用于任意表，即。指定一个表，并将前向填充应用于每一列。

以此表为例：

CREATE TABLE example(row_num int, id int, str text, val integer);
INSERT INTO example VALUES
  (1, 1, '1a', NULL)
, (2, 1, NULL,    1)
, (3, 2, '2a',    2)
, (4, 2, NULL, NULL)
, (5, 3, NULL, NULL)
, (6, 3, '3a',   31)
, (7, 3, NULL, NULL)
, (8, 3, NULL,   32)
, (9, 3, '3b', NULL)
, (10,3, NULL, NULL)
;

Run Code Online (Sandbox Code Playgroud)

我从该函数的以下工作基础开始。我称之为传递一些变量名。请注意，第一个是表名而不是列名。该函数获取表名并创建所有列名的数组，然后输出名称。

create or replace function col_collect(tbl text, id text, row_num text)
    returns text[]
    language plpgsql as
$func$
declare
    tmp text[];
    col text;
begin
    select array (
            select column_name
            from information_schema."columns" c
            where table_name = tbl
            ) into tmp;
    foreach col in array tmp
    loop
        raise notice 'col: %', col;
    end loop;
    return tmp;
end
$func$;

Run Code Online (Sandbox Code Playgroud)

我想将从之前的问题中获得的“前向填充”函数应用到表的每一列。UPDATE似乎是正确的方法。因此，这是前面的函数，我将其替换raise notice为更新execute，以便我可以传入表名称：

create or replace function col_collect(tbl text, id text, row_num text)
    returns void
    language plpgsql as
$func$
declare
    tmp text[];
    col text;
begin
    select array (
            select column_name
            from information_schema."columns" c
            where table_name = tbl
            ) into tmp;
    foreach col in array tmp
    loop
        execute 'update '||tbl||' 
                set '||col||' = gapfill('||col||') OVER w AS '||col||' 
                where '||tbl||'.row_num = '||col||'.row_num
                window w as (PARTITION BY '||id||' ORDER BY '||row_num||') 
                returning *;';
    end loop;
end
$func$;

-- call the function
select col_collect('example','id','row_num')

Run Code Online (Sandbox Code Playgroud)

前面的错误因语法错误而出错。我已经尝试了很多变体，但它们都失败了。关于 SO 的有用答案在这里和这里。我尝试应用的聚合函数（作为窗口函数）是：

CREATE OR REPLACE FUNCTION gap_fill_internal(s anyelement, v anyelement)
  RETURNS anyelement
  LANGUAGE plpgsql AS
$func$
BEGIN
RETURN COALESCE(v, s);  -- that's all!
END
$func$;

CREATE AGGREGATE gap_fill(anyelement) ( 
  SFUNC = gap_fill_internal, 
  STYPE = anyelement 
);

Run Code Online (Sandbox Code Playgroud)

我的问题是：

这是一个好方法吗？如果是的话我做错了什么？或者
有一个更好的方法吗？

Answer 1

Erw*_*ter 5

你所要求的并不是一项微不足道的任务。您应该熟悉 PL/pgSQL。我不建议初学者使用这种动态SQL查询，太强大了。

也就是说，让我们开始吧。系好安全带！

CREATE OR REPLACE FUNCTION f_gap_fill_update(_tbl regclass, _id text, _row_num text, OUT nullable_columns int, OUT updated_rows int)
  LANGUAGE plpgsql AS
$func$
DECLARE
   _pk  text  := quote_ident(_row_num);
   _sql text;
BEGIN   
   SELECT INTO _sql, nullable_columns
          concat_ws(E'\n'
          , 'UPDATE ' || _tbl || ' t'
          , 'SET   (' || string_agg(        quote_ident(a.attname), ', ') || ')'
          , '    = (' || string_agg('u.' || quote_ident(a.attname), ', ') || ')'
          , 'FROM  (' 
          , '   SELECT ' || _pk
          , '        , ' || string_agg(format('gap_fill(%1$I) OVER w AS %1$I', a.attname), ', ')
          , '   FROM   ' || _tbl
          , format('   WINDOW w AS (PARTITION BY %I ORDER BY %s)', _id, _pk)
          , '   ) u'
          , format('WHERE t.%1$s = u.%1$s', _pk)
          , 'AND  (' || string_agg('t.' || quote_ident(a.attname), ', ') || ') IS DISTINCT FROM'
          , '     (' || string_agg('u.' || quote_ident(a.attname), ', ') || ')'
          )
        , count(*) -- AS _col_ct
   FROM  (
      SELECT a.attname
      FROM   pg_attribute a
      WHERE  a.attrelid = _tbl
      AND    a.attnum > 0
      AND    NOT a.attisdropped
      AND    NOT a.attnotnull
      ORDER  BY a.attnum
      ) a;

   IF nullable_columns = 0 THEN
      RAISE EXCEPTION 'No nullable columns found in table >>%<<', _tbl;
   ELSIF _sql IS NULL THEN
      RAISE EXCEPTION 'SQL string is NULL. Should not occur!';
   END IF;
   
   -- RAISE NOTICE '%', _sql;       -- debug
   EXECUTE _sql;              -- execute
   GET DIAGNOSTICS updated_rows = ROW_COUNT; 
END
$func$;

Run Code Online (Sandbox Code Playgroud)

调用示例：

SELECT * FROM f_gap_fill_update('example', 'id', 'row_num');

Run Code Online (Sandbox Code Playgroud)

db<>在这里摆弄

该功能是最先进的。生成并执行以下形式的查询：

SELECT * FROM f_gap_fill_update('example', 'id', 'row_num');

Run Code Online (Sandbox Code Playgroud)

使用pg_catalog.pg_attribute而不是信息模式。看：

请注意最后一个WHERE子句，以防止（可能昂贵的）空更新。只会写入实际更改的行。看：

我如何（或可以）在多个列上选择 DISTINCT？

NOT NULL此外，甚至只会考虑可为空的列（未定义），以避免不必要的工作。

使用ROW语法来UPDATE保持代码简单。看：

SQL 从一张表的字段更新另一张表的字段

该函数返回两个整数值：nullable_columns和updated_rows，报告名称的含义。

该功能可以很好地防御SQL注入。看：

关于GET DIAGNOSTICS：

计算PostgreSQL中批量查询影响的行数

上面的函数更新，但不返回行。这是如何返回不同类型的行的基本演示：

UPDATE tbl t
SET   (str, val, col1)
    = (u.str, u.val, u.col1)
FROM  (
   SELECT row_num
        , gap_fill(str) OVER w AS str, gap_fill(val) OVER w AS val
        , gap_fill(col1) OVER w AS col1
   FROM   tbl
   WINDOW w AS (PARTITION BY id ORDER BY row_num)
   ) u
WHERE t.row_num = u.row_num
AND  (t.str, t.val, t.col1) IS DISTINCT FROM
     (u.str, u.val, u.col1)

Run Code Online (Sandbox Code Playgroud)

调用（注意特殊语法！）：

SELECT * FROM f_gap_fill_select(NULL::example, 'id', 'row_num');

Run Code Online (Sandbox Code Playgroud)

db<>在这里摆弄

关于返回多态行类型：

重构 PL/pgSQL 函数以返回各种 SELECT 查询的输出

归档时间：	3 年，11 月前
查看次数：	1667 次
最近记录：	3 年，11 月前