如何在Postgres中找到所有表的行数

mmr*_*ins 362 postgresql count database-table

我正在寻找一种方法来查找Postgres中所有表的行数.我知道我可以一次做一张桌子:

SELECT count(*) FROM table_name;
Run Code Online (Sandbox Code Playgroud)

但我希望看到所有表格的行数,然后按顺序排列,以了解我所有表格的大小.

Gre*_*ith 531

有三种方法可以获得这种计数,每种方法都有自己的权衡.

如果需要真正的计数,则必须像对每个表使用的那样执行SELECT语句.这是因为PostgreSQL将行可见性信息保留在行本身,而不是其他任何地方,因此任何准确的计数只能与某个事务相关.您将获得该事务在执行时所看到的内容的计数.您可以自动执行此操作以针对数据库中的每个表运行,但您可能不需要那么高的准确度或者想要等待那么久.

第二种方法指出,统计信息收集器随时跟踪大约有多少行是"活动的"(未被更新后删除或废弃).在重度活动下,这个值可能有点偏差,但通常是一个很好的估计:

SELECT schemaname,relname,n_live_tup 
  FROM pg_stat_user_tables 
  ORDER BY n_live_tup DESC;
Run Code Online (Sandbox Code Playgroud)

这也可以显示有多少行已死,这本身就是一个有趣的数字.

第三种方法是注意系统ANALYZE命令,它定期执行autovacuum进程,从PostgreSQL 8.3开始更新表统计信息,也可以计算行估计值.你可以像这样抓住那个:

SELECT 
  nspname AS schemaname,relname,reltuples
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE 
  nspname NOT IN ('pg_catalog', 'information_schema') AND
  relkind='r' 
ORDER BY reltuples DESC;
Run Code Online (Sandbox Code Playgroud)

哪个查询更好用,很难说.通常我根据是否还有更多有用的信息做出决定,我也想在pg_class或pg_stat_user_tables内部使用.出于基本的计数目的,只是为了看一般情况有多大,要么都应该足够准确.

  • 为了完成起见,请为第一个选项添加这个(感谢@a_horse_with_no_name):`with tbl as (SELECT table_schema,table_name FROM information_schema.tables where table_name not like 'pg_%' and table_schema in ('public')) select table_schema, table_name, (xpath('/row/c/text()', query_to_xml(format('select count(*) as c from %I.%I', table_schema, table_name), false, true, '') ))[1]::text::int as rows_n from tbl ORDER BY 3 DESC;` (8认同)
  • “第二种方法”查询(使用“pg_stat_user_tables”)在“n_live_tup”中为我返回的大部分为零,因为“ANALYZE”从未运行过。我没有在每个模式/表上运行“ANALYZE”并永远等待答案,而是首先使用“第三种方法”检查结果,并且该方法(使用“pg_class”)返回了非常准确的计数。 (2认同)

a_h*_*ame 51

这是一个解决方案,不需要函数来获得每个表的准确计数:

select table_schema, 
       table_name, 
       (xpath('/row/cnt/text()', xml_count))[1]::text::int as row_count
from (
  select table_name, table_schema, 
         query_to_xml(format('select count(*) as cnt from %I.%I', table_schema, table_name), false, true, '') as xml_count
  from information_schema.tables
  where table_schema = 'public' --<< change here for the schema you want
) t
Run Code Online (Sandbox Code Playgroud)

query_to_xml将运行传递的SQL查询并返回带有结果的XML(该表的行数).然后外部xpath()将从该xml中提取计数信息并将其转换为数字

派生表并不是必需的,但是xpath()更容易理解 - 否则query_to_xml()需要将整个表传递给xpath()函数.

  • 非常聪明.遗憾的是没有`query_to_jsonb()`. (3认同)
  • 这给出了一个真实的计数,而接受的答案则不是预期的。谢谢! (2认同)

Dan*_*ité 22

要获得估算,请参阅Greg Smith的回答.

为了得到确切的数字,到目前为止的其他答案都存在一些问题,其中一些是严重的(见下文).这是一个希望更好的版本:

CREATE FUNCTION rowcount_all(schema_name text default 'public')
  RETURNS table(table_name text, cnt bigint) as
$$
declare
 table_name text;
begin
  for table_name in SELECT c.relname FROM pg_class c
    JOIN pg_namespace s ON (c.relnamespace=s.oid)
    WHERE c.relkind = 'r' AND s.nspname=schema_name
  LOOP
    RETURN QUERY EXECUTE format('select cast(%L as text),count(*) from %I.%I',
       table_name, schema_name, table_name);
  END LOOP;
end
$$ language plpgsql;
Run Code Online (Sandbox Code Playgroud)

它将模式名称作为参数,或者public如果没有给出参数.

要在不修改函数的情况下使用特定的模式列表或来自查询的列表,可以在查询中调用它,如下所示:

WITH rc(schema_name,tbl) AS (
  select s.n,rowcount_all(s.n) from (values ('schema1'),('schema2')) as s(n)
)
SELECT schema_name,(tbl).* FROM rc;
Run Code Online (Sandbox Code Playgroud)

这将产生一个3列输出,其中包含模式,表和行数.

现在这里是这个函数避免的其他答案中的一些问题:

  • 表格和模式名称不应在没有引用的情况下注入可执行SQL,无论是使用quote_ident更现代的format()函数还是使用其%I格式字符串.否则,一些恶意的人可能将他们的表命名tablename;DROP TABLE other_table为完全有效的表名.

  • 即使没有SQL注入和有趣的字符问题,表名也可能存在于大小写不同的变体中.如果一个表被命名ABCD而另一个被命名abcd,则SELECT count(*) FROM...必须使用带引号的名称,否则它将跳过ABCD并计数abcd两次.该%I格式的自动执行此操作.

  • information_schema.tables除表格外,还列出自定义复合类型,即使table_type为'BASE TABLE'(!).因此,我们无法进行迭代information_schema.tables,否则我们会冒险select count(*) from name_of_composite_type而且会失败.OTOH pg_class where relkind='r'应该总能正常工作.

  • COUNT()的类型bigint不是int.可能存在超过21.5亿行的表(虽然对它们运行计数(*)是个坏主意).

  • 不需要为函数创建永久类型以返回具有多个列的结果集.RETURNS TABLE(definition...)是一个更好的选择.


ig0*_*774 16

如果您不介意可能过时的数据,则可以访问查询优化程序使用的相同统计信息.

就像是:

SELECT relname, n_tup_ins - n_tup_del as rowcount FROM pg_stat_all_tables;
Run Code Online (Sandbox Code Playgroud)

  • 刚尝试过它并不是正确的答案. (5认同)
  • @mlissner:如果您的 autovacuum 间隔太长或者您没有在表上运行手动“ANALYZE”,则统计信息可能会偏离。这是数据库负载以及数据库配置方式的问题(如果更频繁地更新统计信息,统计信息会更准确,但可能会降低运行时性能)。最终,获得准确数据的唯一方法是对所有表运行“select count(*) from table”。 (2认同)

Aur*_*raf 14

对于那些试图评估他们需要哪个Heroku计划并且不能等待heroku的慢行计数器刷新的人来说,这是一个非常实用的答案:

基本上你想运行\dtpsql,复制结果,以你喜欢的文本编辑器(它看起来就像这样:

 public | auth_group                     | table | axrsosvelhutvw
 public | auth_group_permissions         | table | axrsosvelhutvw
 public | auth_permission                | table | axrsosvelhutvw
 public | auth_user                      | table | axrsosvelhutvw
 public | auth_user_groups               | table | axrsosvelhutvw
 public | auth_user_user_permissions     | table | axrsosvelhutvw
 public | background_task                | table | axrsosvelhutvw
 public | django_admin_log               | table | axrsosvelhutvw
 public | django_content_type            | table | axrsosvelhutvw
 public | django_migrations              | table | axrsosvelhutvw
 public | django_session                 | table | axrsosvelhutvw
 public | exercises_assignment           | table | axrsosvelhutvw
Run Code Online (Sandbox Code Playgroud)

),然后运行正则表达式搜索并替换如下:

^[^|]*\|\s+([^|]*?)\s+\| table \|.*$
Run Code Online (Sandbox Code Playgroud)

至:

select '\1', count(*) from \1 union/g
Run Code Online (Sandbox Code Playgroud)

这会产生一些与此类似的东西:

select 'auth_group', count(*) from auth_group union
select 'auth_group_permissions', count(*) from auth_group_permissions union
select 'auth_permission', count(*) from auth_permission union
select 'auth_user', count(*) from auth_user union
select 'auth_user_groups', count(*) from auth_user_groups union
select 'auth_user_user_permissions', count(*) from auth_user_user_permissions union
select 'background_task', count(*) from background_task union
select 'django_admin_log', count(*) from django_admin_log union
select 'django_content_type', count(*) from django_content_type union
select 'django_migrations', count(*) from django_migrations union
select 'django_session', count(*) from django_session
;
Run Code Online (Sandbox Code Playgroud)

(你需要删除union并在末尾手动添加分号)

运行它psql,你就完成了.

            ?column?            | count
--------------------------------+-------
 auth_group_permissions         |     0
 auth_user_user_permissions     |     0
 django_session                 |  1306
 django_content_type            |    17
 auth_user_groups               |   162
 django_admin_log               |  9106
 django_migrations              |    19
[..]
Run Code Online (Sandbox Code Playgroud)

  • “不要忘记删除分号之前的最后一个`union`”就是我的意思:)添加了“最后一个”一词来澄清 (2认同)

est*_*ani 12

摘自我在 GregSmith 的回答中的评论,以使其更具可读性:

with tbl as (
  SELECT table_schema,table_name 
  FROM information_schema.tables
  WHERE table_name not like 'pg_%' AND table_schema IN ('public')
)
SELECT 
  table_schema, 
  table_name, 
  (xpath('/row/c/text()', 
    query_to_xml(format('select count(*) AS c from %I.%I', table_schema, table_name), 
    false, 
    true, 
    '')))[1]::text::int AS rows_n 
FROM tbl ORDER BY 3 DESC;
Run Code Online (Sandbox Code Playgroud)

感谢@a_horse_with_no_name


小智 10

简单的两个步骤:(
注意:无需更改任何内容 - 只需复制粘贴)
1. 创建函数

create function 
cnt_rows(schema text, tablename text) returns integer
as
$body$
declare
  result integer;
  query varchar;
begin
  query := 'SELECT count(1) FROM ' || schema || '.' || tablename;
  execute query into result;
  return result;
end;
$body$
language plpgsql;
Run Code Online (Sandbox Code Playgroud)

2. 运行此查询以获取所有表的行数

select sum(cnt_rows) as total_no_of_rows from (select 
  cnt_rows(table_schema, table_name)
from information_schema.tables
where 
  table_schema not in ('pg_catalog', 'information_schema') 
  and table_type='BASE TABLE') as subq;
Run Code Online (Sandbox Code Playgroud)



以表格方式获取行数

select
  table_schema,
  table_name, 
  cnt_rows(table_schema, table_name)
from information_schema.tables
where 
  table_schema not in ('pg_catalog', 'information_schema') 
  and table_type='BASE TABLE'
order by 3 desc;
Run Code Online (Sandbox Code Playgroud)


Ste*_*-au 9

不确定bash中的答案是否可以接受,但FWIW ......

PGCOMMAND=" psql -h localhost -U fred -d mydb -At -c \"
            SELECT   table_name
            FROM     information_schema.tables
            WHERE    table_type='BASE TABLE'
            AND      table_schema='public'
            \""
TABLENAMES=$(export PGPASSWORD=test; eval "$PGCOMMAND")

for TABLENAME in $TABLENAMES; do
    PGCOMMAND=" psql -h localhost -U fred -d mydb -At -c \"
                SELECT   '$TABLENAME',
                         count(*) 
                FROM     $TABLENAME
                \""
    eval "$PGCOMMAND"
done
Run Code Online (Sandbox Code Playgroud)

  • 从本质上讲,这只是归结为来自table_name的`select count(*);在OP中的`! (7认同)

Sye*_*taq 8

You Can use this query to generate all tablenames with their counts

select ' select  '''|| tablename  ||''', count(*) from ' || tablename ||' 
union' from pg_tables where schemaname='public'; 
Run Code Online (Sandbox Code Playgroud)

the result from the above query will be

select  'dim_date', count(*) from dim_date union 
select  'dim_store', count(*) from dim_store union
select  'dim_product', count(*) from dim_product union
select  'dim_employee', count(*) from dim_employee union
Run Code Online (Sandbox Code Playgroud)

You'll need to remove the last union and add the semicolon at the end !!

select  'dim_date', count(*) from dim_date union 
select  'dim_store', count(*) from dim_store union
select  'dim_product', count(*) from dim_product union
select  'dim_employee', count(*) from dim_employee  **;**
Run Code Online (Sandbox Code Playgroud)

RUN !!!


Yur*_*sky 7

我通常不依赖统计数据,特别是在PostgreSQL中.

SELECT table_name, dsql2('select count(*) from '||table_name) as rownum
FROM information_schema.tables
WHERE table_type='BASE TABLE'
    AND table_schema='livescreen'
ORDER BY 2 DESC;
Run Code Online (Sandbox Code Playgroud)
CREATE OR REPLACE FUNCTION dsql2(i_text text)
  RETURNS int AS
$BODY$
Declare
  v_val int;
BEGIN
  execute i_text into v_val;
  return v_val;
END; 
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;
Run Code Online (Sandbox Code Playgroud)


Gna*_*nam 6

我不记得我收集它的URL.但希望这可以帮助你:

CREATE TYPE table_count AS (table_name TEXT, num_rows INTEGER); 

CREATE OR REPLACE FUNCTION count_em_all () RETURNS SETOF table_count  AS '
DECLARE 
    the_count RECORD; 
    t_name RECORD; 
    r table_count%ROWTYPE; 

BEGIN
    FOR t_name IN 
        SELECT 
            c.relname
        FROM
            pg_catalog.pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
        WHERE 
            c.relkind = ''r''
            AND n.nspname = ''public'' 
        ORDER BY 1 
        LOOP
            FOR the_count IN EXECUTE ''SELECT COUNT(*) AS "count" FROM '' || t_name.relname 
            LOOP 
            END LOOP; 

            r.table_name := t_name.relname; 
            r.num_rows := the_count.count; 
            RETURN NEXT r; 
        END LOOP; 
        RETURN; 
END;
' LANGUAGE plpgsql; 
Run Code Online (Sandbox Code Playgroud)

执行select count_em_all();应该让你获得所有表的行数.


小智 6

这对我有用

SELECT schemaname,relname,n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC;


小智 5

我做了一个小变化来包括所有表,也用于非公共表.

CREATE TYPE table_count AS (table_schema TEXT,table_name TEXT, num_rows INTEGER); 

CREATE OR REPLACE FUNCTION count_em_all () RETURNS SETOF table_count  AS '
DECLARE 
    the_count RECORD; 
    t_name RECORD; 
    r table_count%ROWTYPE; 

BEGIN
    FOR t_name IN 
        SELECT table_schema,table_name
        FROM information_schema.tables
        where table_schema !=''pg_catalog''
          and table_schema !=''information_schema''
        ORDER BY 1,2
        LOOP
            FOR the_count IN EXECUTE ''SELECT COUNT(*) AS "count" FROM '' || t_name.table_schema||''.''||t_name.table_name
            LOOP 
            END LOOP; 

            r.table_schema := t_name.table_schema;
            r.table_name := t_name.table_name; 
            r.num_rows := the_count.count; 
            RETURN NEXT r; 
        END LOOP; 
        RETURN; 
END;
' LANGUAGE plpgsql; 
Run Code Online (Sandbox Code Playgroud)

select count_em_all();它来称呼它.

希望你发现这很有用.保罗