sys.partitions 行计数严重错误 - 如何纠正?

Nei*_*l P 5 sql-server partitioning azure-sql-data-warehouse

查询 sys.partitions 可以返回表的近似行数。

我注意到,无论实际内容如何(即使是空分区),这都会为所有分区返回相同的行数。

该表具有聚集列存储索引,并且几乎所有列上都创建了统计信息。每次数据加载后,每天都会更新统计信息。该表按日期分区。

sys.partitions 查询:

    SELECT   convert(date, convert(varchar,rv.[value])) as partitionDate, p.rows as syspartitions_RowCount
        FROM       sys.tables t     
        join       sys.schemas  sc on sc.schema_id = t.schema_id        
        JOIN        sys.partitions p                ON      p.[object_id]         = t.[object_id]
        JOIN        sys.indexes i                   ON      i.[object_id]         = p.[object_id]
                                                    AND     i.[index_id]          = p.[index_id]
        JOIN        sys.data_spaces ds              ON      ds.[data_space_id]    = i.[data_space_id]
        LEFT JOIN   sys.partition_schemes ps        ON      ps.[data_space_id]    = ds.[data_space_id]
        LEFT JOIN   sys.partition_functions pf      ON      pf.[function_id]      = ps.[function_id]
        LEFT JOIN   sys.partition_range_values rv   ON      rv.[function_id]      = pf.[function_id]
                                                    AND     rv.[boundary_id]+1      = p.[partition_number]
        WHERE   p.[index_id] <=1
                and t.[name] ='tbl'
                and sc.name = 'temp'
                and convert(date, convert(varchar,rv.[value])) > '2016-05-31'
                order by convert(date, convert(varchar,rv.[value])), 
                t.[name]
Run Code Online (Sandbox Code Playgroud)

表查询:

                    select date, count_big(*) as real_count
                from temp.tbl
                where date > '2016-05-31'
                group by date
                order by date 
Run Code Online (Sandbox Code Playgroud)

两个查询的示例结果:

在此处输入图片说明

Han*_*non 6

尝试使用sys.dm_db_partition_stats而不是sys.partitions,如:

SELECT ObjectName = QUOTENAME(sc.name) + '.' + QUOTENAME(t.name)
    , RangeValue = rv.value
    , sys_partitions_RowCount = p.rows
    , sys_dm_db_partition_stats_row_count = ddps.row_count
FROM sys.tables t 
    INNER JOIN sys.schemas sc ON t.schema_id = sc.schema_id 
    INNER JOIN sys.partitions p ON t.object_id = p.object_id
    INNER JOIN sys.indexes i ON t.object_id = i.object_id
        AND p.index_id = i.index_id
    INNER JOIN sys.data_spaces ds ON ds.data_space_id = i.data_space_id
    INNER JOIN sys.partition_schemes ps ON ps.data_space_id = ds.data_space_id
    INNER JOIN sys.partition_functions pf ON pf.function_id = ps.function_id
    INNER JOIN sys.partition_range_values rv ON rv.function_id = pf.function_id
        AND (rv.boundary_id + 1) = p.partition_number
    INNER JOIN sys.dm_db_partition_stats ddps ON t.object_id = ddps.object_id 
        AND p.partition_id = ddps.partition_id
WHERE p.index_id <= 1
    and t.name ='tbl'
    and sc.name = 'temp'
ORDER BY sc.name
    , t.name
    , rv.value;
Run Code Online (Sandbox Code Playgroud)

对于 Azure SQL 数据仓库,您需要使用sys.dm_pdw_nodes_db_partition_stats代替sys.dm_db_partition_stats,即使它们包含相同的详细信息。

请注意,我删除了该CONVERT(date,...)功能,因此此代码与所有分区方案兼容,而不仅仅是具有日期范围值的那些。

在 SQL Server 的本地版本中,sys.partitions从内部表ALUCOUNT或获取其行计数sys.sysrowsets,如果ALUCOUNT.rows为 NULL。的定义sys.partitions是:

CREATE VIEW sys.partitions AS
    SELECT rs.rowsetid AS partition_id
        , rs.idmajor AS object_id
        , rs.idminor AS index_id
        , rs.numpart AS partition_number
        , rs.rowsetid AS hobt_id
        , isnull(ct.rows, rs.rcrows) AS rows
        , rs.fgidfs AS filestream_filegroup_id
        , cmprlevel AS data_compression
        , cl.name AS data_compression_desc
    FROM sys.sysrowsets rs OUTER APPLY OpenRowset(TABLE ALUCOUNT, rs.rowsetid, 0, 0) ct
    LEFT JOIN sys.syspalvalues cl ON cl.class = 'CMPL' AND cl.value = cmprlevel
Run Code Online (Sandbox Code Playgroud)

的本地版本sys.dm_db_partition_stats从内部表中获取其行数不同PARTITIONCOUNTS

CREATE VIEW sys.dm_db_partition_stats AS
    SELECT c.partition_id
        , i.object_id
        , i.index_id
        , c.partition_number
        , c.in_row_data_page_count
        , c.in_row_used_page_count
        , c.in_row_reserved_page_count
        , c.lob_used_page_count
        , c.lob_reserved_page_count
        , c.row_overflow_used_page_count
        , c.row_overflow_reserved_page_count
        , c.used_page_count
        , c.reserved_page_count
        , c.row_count
FROM sys.indexes$ i 
CROSS APPLY OpenRowSet(TABLE PARTITIONCOUNTS, i.object_id, i.index_id, i.rowset) c
Run Code Online (Sandbox Code Playgroud)

虽然两者sys.partitionssys.dm_db_partition_stats 都具有正确的行数,我把更多的信任,在PARTITIONCOUNTS内部表。