在 PostgreSQL 中查找表的空列

Seb*_*Seb 18 schema postgresql null database-design

什么查询会返回所有行都为 NULL 的表的列名?

Jac*_*las 14

试验台:

create role stack;
create schema authorization stack;
set role stack;

create table my_table as 
select generate_series(0,9) as id, 1 as val1, null::integer as val2;

create table my_table2 as 
select generate_series(0,9) as id, 1 as val1, null::integer as val2, 3 as val3;
Run Code Online (Sandbox Code Playgroud)

功能:

create function has_nonnulls(p_schema in text, p_table in text, p_column in text)
                returns boolean language plpgsql as $$
declare 
  b boolean;
begin
  execute 'select exists(select * from '||
          p_table||' where '||p_column||' is not null)' into b;
  return b;
end;$$;
Run Code Online (Sandbox Code Playgroud)

询问:

select table_schema, table_name, column_name, 
       has_nonnulls(table_schema, table_name, column_name)
from information_schema.columns
where table_schema='stack';
Run Code Online (Sandbox Code Playgroud)

结果:

 table_schema | table_name | column_name | has_nonnulls
--------------+------------+-------------+--------------
 stack        | my_table   | id          | t
 stack        | my_table   | val1        | t
 stack        | my_table   | val2        | f
 stack        | my_table2  | id          | t
 stack        | my_table2  | val1        | t
 stack        | my_table2  | val2        | f
 stack        | my_table2  | val3        | t
(7 rows)
Run Code Online (Sandbox Code Playgroud)

此外,您可以通过查询目录获得近似答案 - 如果null_frac为零表示没有空值,但应针对“真实”数据进行双重检查:

select tablename, attname, null_frac from pg_stats where schemaname='stack';

 tablename | attname | null_frac
-----------+---------+-----------
 my_table  | id      |         0
 my_table  | val1    |         0
 my_table  | val2    |         1
 my_table2 | id      |         0
 my_table2 | val1    |         0
 my_table2 | val2    |         1
 my_table2 | val3    |         0
(7 rows)
Run Code Online (Sandbox Code Playgroud)


Den*_*rdy 11

在 Postgresql 中,您可以直接从 stats 中获取数据:

vacuum analyze; -- if needed

select schemaname, tablename, attname
from pg_stats
where most_common_vals is null
and most_common_freqs is null
and histogram_bounds is null
and correlation is null
and null_frac = 1;
Run Code Online (Sandbox Code Playgroud)

您可能会得到一些误报,因此在找到候选人后需要重新检查。


Mar*_*ian 1

我将向您展示我在 T-SQL 中的解决方案,适用于 SQL Server 2008。我不熟悉 PostgreSQL,但我希望您能在我的解决方案中找到一些指导。

-- create test table
IF object_id ('dbo.TestTable') is not null
    DROP table testTable
go
create table testTable (
    id int identity primary key clustered,
    nullColumn varchar(100) NULL,
    notNullColumn varchar(100) not null,
    combinedColumn varchar(100) NULL,
    testTime datetime default getdate()
);
go

-- insert test data:
INSERT INTO testTable(nullColumn, notNullColumn, combinedColumn)
SELECT NULL, 'Test', 'Combination'
from sys.objects
union all
SELECT NULL, 'Test2', NULL
from sys.objects

select *
from testTable

-- FIXED SCRIPT FOR KNOWN TABLE (known structure) - find all completely NULL columns
select sum(datalength(id)) as SumColLength,
    'id' as ColumnName
from dbo.testTable
UNION ALL
select sum(datalength(nullColumn)) as SumColLength,
    'nullColumn' as ColumnName
from dbo.testTable
UNION ALL
select sum(datalength(notNullColumn)) as SumColLength,
    'notNullColumn' as ColumnName
from dbo.testTable
UNION ALL
select sum(datalength(combinedColumn)) as SumColLength,
    'combinedColumn' as ColumnName
from dbo.testTable
UNION ALL
select sum(datalength(testTime)) as SumColLength,
    'testTime' as ColumnName
from dbo.testTable

-- DYNAMIC SCRIPT (unknown structure) - find all completely NULL columns
declare @sql varchar(max) = '', @tableName sysname = 'testTable';

SELECT @sql +=
        'select sum(datalength(' + c.COLUMN_NAME + ')) as SumColLength,
    ''' + c.COLUMN_NAME + ''' as ColumnName
from ' + c.TABLE_SCHEMA + '.' + c.TABLE_NAME --as StatementToExecute
+ '
UNION ALL
'
FROM INFORMATION_SCHEMA.COLUMNS c
WHERE c.TABLE_NAME = @tableName;

SET @sql = left(@sql, len(@sql)-11)
print @sql;
exec (@sql);
Run Code Online (Sandbox Code Playgroud)

简而言之,我所做的是创建一个包含 5 列的测试表,ID 和 testTime 由 Identity 和 getdate() 函数生成,而 3 个 varchar 列是感兴趣的列。一种只有 NULL 值,一种没有任何 NULL,另一种是组合列。该脚本的最终结果将是该脚本将报告列 nullColumn 所有行均为 NULL。

这个想法是计算每列的函数DATALENGTH(计算给定表达式的字节数)。因此,我计算了每列每行的 DATALENGTH 值,并对每列进行了求和。如果每列的 SUM 为 NULL,则整个列有 NULL 行,否则里面有一些数据。

现在您必须找到 PostgreSQL 的翻译,希望同事能够帮助您。或者也许有一个很好的系统视图可以显示我重新发明轮子是多么愚蠢:-)。