cro*_*sek 4 sql-server columnstore sql-server-2019
给定一个简单的基于行的表,没有 PK 但有一个基于行的聚集索引,如下所示:
create clustered index [CX_PropertyValue] ON [dbo].[PropertyValue] ([PropertyId], [Value])
Run Code Online (Sandbox Code Playgroud)
然后我希望添加一个列存储索引,该索引按与上面的聚集索引相同的顺序进行分段:
create nonclustered columnstore index CS_IX_PropertyValue on dbo.PropertyValue(
PropertyId, Value
)
with (drop_existing = on, maxdop = 1); -- maxdop=1 to preserve the order by property
Run Code Online (Sandbox Code Playgroud)
MaxDop 保留顺序的提示来自:这里
然后使用以下查询报告 PropertyId 列的最小/最大 data_id,并报告 7 个段中的每个段的完整范围:
create view [Common].[ColumnStoreSegmentationView]
as
/*---------------------------------------------------------------------------------------------------------------------
Purpose: List ColumnStore table segment min/max of columns.
Source: https://joyfulcraftsmen.com/blog/cci-how-to-load-data-for-better-columnstore-segment-elimination/
https://dba.stackexchange.com/a/268329/9415
Modified By Description
---------- ---------- -----------------------------------------------------------------------------------------
2020.06.02 crokusek/inet Initial Version
---------------------------------------------------------------------------------------------------------------------*/
select --top 20000000000
s.Name as SchemaName,
t.Name as TableName,
i.Name as IndexName,
c.name as ColumnName,
c.column_id as ColumnId,
cs.segment_id as SegmentId,
cs.min_data_id as MinValue,
cs.max_data_id as MaxValue
from sys.schemas s
join sys.tables t
on t.schema_id = s.schema_id
join sys.partitions as p
on p.object_id = t.object_id
join sys.indexes as I
on i.object_id = p.object_id
and i.index_id = p.index_id
join sys.index_columns as ic
on ic.[object_id] = I.[object_id]
and ic.index_id = I.index_id
join sys.columns c
on c.object_id = t.object_id
and c.column_id = ic.column_id
join sys.column_store_segments cs
on cs.hobt_id = p.hobt_id
and cs.column_id = ic.index_column_id
--order by s.Name, t.Name, i.Name, c.Name, cs.Segment_Id
GO
Run Code Online (Sandbox Code Playgroud)
我尝试使聚集索引唯一,这确实对报告的范围产生了轻微影响,但仍然没有单调增加。
有任何想法吗?
这是一个以这种方式完成分割的链接,但我没有看到任何区别。
版本:Microsoft SQL Server 2019 (RTM) - 15.0.2000.5 (X64)
非聚集列存储索引不直接支持此操作。
它确实适用于聚集列存储。
Azure Synapse Analytics 有语言支持,可以一步完成,例如:
CREATE CLUSTERED COLUMNSTORE INDEX <index_name>
ON dbo.PropertyValue
ORDER (PropertyId, Value);
Run Code Online (Sandbox Code Playgroud)
这种语法还没有出现在 SQL Server box 产品中,尽管它在一个未记录的功能标志下可用,所以也许它并不遥远。尽管如此,它仍然不适用于非聚集列存储索引。
你能做的最好的是创建非聚集rowstore索引MAXDOP = 1
,然后用非聚集替换列存储索引中MAXDOP = 1
和DROP_EXISTING = ON
。
这不能保证按照您的需要保留排序,但很有可能:
CREATE NONCLUSTERED INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (MAXDOP = 1);
CREATE NONCLUSTERED COLUMNSTORE INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (DROP_EXISTING = ON, MAXDOP = 1);
Run Code Online (Sandbox Code Playgroud)
这将为您提供在筛选时实现行组消除的最佳机会PropertyId
。
当所需的排序与行存储聚集索引匹配时(如问题中的情况),无需先创建行存储非聚集索引。该文件说:
请注意,对于非聚集列存储索引 (NCCI),如果基本行存储表具有聚集索引,则行已经排序。在这种情况下,生成的非聚集列存储索引将自动排序。
因此,在您的情况下,仅运行就足够了:
CREATE NONCLUSTERED COLUMNSTORE INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (MAXDOP = 1);
Run Code Online (Sandbox Code Playgroud)
请参阅此db<>fiddle 演示。
您可以使用以下方法查看每个行组和列的最小值和最大值:
SELECT
CSS.column_id,
column_name = C.[name],
rowgroup_id = CSS.segment_id,
CSS.min_data_id,
CSS.max_data_id,
CSS.row_count
FROM sys.partitions AS P
JOIN sys.column_store_segments AS CSS
ON CSS.hobt_id = P.hobt_id
JOIN sys.indexes AS I
ON I.[object_id] = P.[object_id]
AND I.index_id = P.index_id
JOIN sys.index_columns AS IC
ON IC.[object_id] = I.[object_id]
AND IC.index_id = I.index_id
AND IC.index_column_id = CSS.column_id
JOIN sys.columns AS C
ON C.[object_id] = P.[object_id]
AND C.column_id = IC.column_id
WHERE
P.[object_id] = OBJECT_ID(N'dbo.PropertyValue', N'U')
ORDER BY
C.column_id,
CSS.segment_id;
Run Code Online (Sandbox Code Playgroud)