查询以获取每列值的计数

use*_*289 2 postgresql count postgresql-9.6

我有一个由供应商提供的数据的大表(我无法对其进行太多更改),大约有 315 列。我怀疑许多列没有被使用(或者至少不一致)。

我想要一个查询,它可以为我提供表中每列值的计数。

例如

CREATE TABLE foo AS VALUES
    ( null   , 'xyz'  , 'pdq'  , null ),
    ( 'abc'  , 'def'  , 'ghj'  , null ),
    ( 'hsh'  , 'fff'  , 'oko'  , null );
Run Code Online (Sandbox Code Playgroud)

所以这会产生类似的结果:

Col1 | 2
Col2 | 3
Col3 | 3
Col4 | 0
Run Code Online (Sandbox Code Playgroud)

编辑:澄清一下,我知道我可以使用,COUNT但我希望有一种方法可以首先循环遍历系统表的查询,以避免必须手动编写 315 count 语句。谢谢!

就像是

FOR column_names IN SELECT * FROM information_schema.columns WHERE 
table_schema = 'public' AND table_name = 'vendor'
LOOP
 RAISE NOTICE 'doing %s', quote_ident(column_names.column_name);
 SELECT count(column_names.column_name) from vendor      
END LOOP;
Run Code Online (Sandbox Code Playgroud)

Eva*_*oll 5

您可以像这样轻松完成第一部分,

SELECT FORMAT(
        E'SELECT %s\nFROM %I.%I.%I;' -- query template
        , string_agg(  -- generate the select list for query template
                FORMAT('count(DISTINCT %I) AS %I', column_name, column_name)
                , E',\n\t'
        ),
        table_catalog, -- not strictly required, but future safe
        table_schema,
        table_name
)
FROM information_schema.columns
WHERE table_name = 'foo'
GROUP BY table_catalog, table_schema, table_name; 
Run Code Online (Sandbox Code Playgroud)

这将返回这样的查询,

SELECT count(DISTINCT column1) AS column1,
        count(DISTINCT column2) AS column2,
        count(DISTINCT column3) AS column3,
        count(DISTINCT column4) AS column4
FROM ecarroll.public.foo;
Run Code Online (Sandbox Code Playgroud)

这几乎就是您想要的,只是您需要对其进行调整。

 column1 | column2 | column3 | column4 
---------+---------+---------+---------
       2 |       3 |       3 |       0
Run Code Online (Sandbox Code Playgroud)

为了做到这一点,我们可以使用unnest(ARRAY[cols]) AS col_name,所以我们本质上必须生成

  • 动态 SQL 执行 count()
  • 用更动态的 sql 包装它来进行数据透视。

像这样,

SELECT FORMAT(
        $$
        SELECT ordinality AS column_number, distinct_values -- the col#, and count
        FROM (
                SELECT %s      -- This was the query we
                FROM %I.%I.%I  -- used previously
        ) AS t
        CROSS JOIN unnest(ARRAY[%s]) WITH ORDINALITY -- Here we use unnest(array)
                AS distinct_values;                  -- to pivot the table
        $$,
        string_agg(
                FORMAT('count(DISTINCT %I) AS %I', column_name, column_name)
                , E',\n\t'
        ),
        table_catalog,
        table_schema,
        table_name,
        string_agg(column_name, ', ')
)
FROM information_schema.columns
WHERE table_name = 'foo'
GROUP BY table_catalog, table_schema, table_name;
Run Code Online (Sandbox Code Playgroud)

返回这样的查询..

    SELECT ordinality AS column_number, distinct_values
    FROM (
            SELECT count(DISTINCT column1) AS column1,
    count(DISTINCT column2) AS column2,
    count(DISTINCT column3) AS column3,
    count(DISTINCT column4) AS column4
            FROM ecarroll.public.foo
    ) AS t
    CROSS JOIN unnest(ARRAY[column1, column2, column3, column4]) WITH ORDINALITY
            AS distinct_values;
Run Code Online (Sandbox Code Playgroud)

你只要跑\gexec,你就会得到,

 column_number | distinct_values 
---------------+-----------------
             1 |               2
             2 |               3
             3 |               3
             4 |               0
Run Code Online (Sandbox Code Playgroud)