内存优化间歇性非常慢的性能

Jur*_*obl 5 performance sql-server memory-optimized-tables sql-server-2019

我有一个内存优化表(在我的测试中)有 1000 万行。UPDATE针对它并行运行一些,我得到 CPU 的间歇性峰值和查询的执行时间的巨大峰值(从 1 毫秒开始> 1 秒)。CPU 峰值是由 SQLServer 引起的。在这些高峰期间,所有查询都很慢,即使是不使用相同表或GETUTCDATE.

它在 SQL Server 2019 CU10、8 核和 100GB RAM 上运行。我还让它在裸机服务器(24 核)、新的 VM 和我的笔记本电脑上运行,结果或多或少相同。该表在我的测试系统中有 1000 万行,在 prod 中有 40 mio。有四个NONCLUSTEREDindizes 和两个NONCLUSTERED HASH,它们没有长链并且有很多未装满的桶。

Prod 使用许多 InMemory 表、外键和本机编译的过程,所以我在减少问题的同时保持完全相同的特征时遇到了一些问题。在我重现该问题时,我在 C# 中并行运行 10 个线程,每次更新之间等待 1-5 毫秒:

UPDATE ContainerAutoTest SET LastComment = 'TestEntry' WHERE Name = @TestContainer
Run Code Online (Sandbox Code Playgroud)

每个线程都有自己的要更新的行,因此行级别不应该有任何冲突。我是从同一台机器还是通过网络运行查询都没有关系。

在 prod 中,我们在写入数据时大约每 30 秒就会出现一次尖峰,在我的测试中,我可以在大约 20 秒甚至每 2 秒处重现尖峰,具体取决于机器、我正在运行的线程数量以及更新之间的等待时间。

Prod 中的 CPU 峰值Cpu尖峰再现

我已经重建了一个全新的表,我可以在其中重现问题:

CREATE TABLE [dbo].[ContainerAutoTest]
(
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [ResourceId] [int] NULL,
    [CurrentStepId] [int] NULL,
    [ProductId] [int] NULL,
    [Level] [nvarchar](50) NULL,
    [LastComment] [nvarchar](100) NULL,
    [SysStart] [datetime2](7) NULL,
    [Name] NVARCHAR(100) NOT NULL,
INDEX [Ix_CurrentStep] NONCLUSTERED([CurrentStepId] ASC),
INDEX [Ix_Level] NONCLUSTERED([Level] ASC),
INDEX [Ix_Product] NONCLUSTERED([ProductId] ASC),
INDEX [Ix_Resource] NONCLUSTERED([ResourceId] ASC),
PRIMARY KEY NONCLUSTERED HASH([Id]) WITH ( BUCKET_COUNT = 16777216),
UNIQUE NONCLUSTERED HASH([Name]) WITH ( BUCKET_COUNT = 16777216)
)
WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA )
GO
ALTER TABLE [dbo].[ContainerAutoTest] ADD  DEFAULT ('Modul') FOR [Level]
GO
ALTER TABLE [dbo].[ContainerAutoTest] ADD  DEFAULT (SYSUTCDATETIME()) FOR [SysStart]
GO

PRINT('Clearing table...')
DELETE FROM [dbo].[ContainerAutoTest] WHERE 1=1
GO
PRINT('Filling table...')
DECLARE @RowsToInsert INT = 10;
DECLARE @RowsInserted INT = 0;
WHILE @RowsInserted < @RowsToInsert
BEGIN
    INSERT INTO ContainerAutoTest (Name)
        SELECT TOP 1000000 NEWID()
            FROM sys.all_columns ac1
                CROSS JOIN sys.all_columns ac2;
    SET @RowsInserted = @RowsInserted + 1;
END
PRINT('Filled table')
GO
Run Code Online (Sandbox Code Playgroud)

我在仅使用 SQL 重现确切问题时遇到了一些问题,因此我目前正在使用 C# 运行测试。

数据库主要是使用默认值创建的,这里是导出的CREATE,我试图删除大部分不应该影响问题的默认值。

CREATE DATABASE [TestDbSlowInMemory]
    ON  PRIMARY
    ( NAME = N'TestDbSlowInMemory', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\DATA\TestDbSlowInMemory'  ),
    FILEGROUP [TestDbSlowInMemory] CONTAINS MEMORY_OPTIMIZED_DATA  DEFAULT
    ( NAME = N'TestDbSlowInMemoryInMemory', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\DATA\TestDbSlowInMemoryInMemory' )
    LOG ON
    ( NAME = N'TestDbSlowInMemory_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\MSSQL\DATA\TestDbSlowInMemory_log'  )

ALTER DATABASE [TestDbSlowInMemory] SET AUTO_UPDATE_STATISTICS ON 
GO
ALTER DATABASE [TestDbSlowInMemory] SET AUTO_UPDATE_STATISTICS_ASYNC ON 
GO
ALTER DATABASE [TestDbSlowInMemory] SET ALLOW_SNAPSHOT_ISOLATION ON 
GO
ALTER DATABASE [TestDbSlowInMemory] SET READ_COMMITTED_SNAPSHOT ON 
GO
ALTER DATABASE [TestDbSlowInMemory] SET MULTI_USER 
GO
ALTER DATABASE [TestDbSlowInMemory] SET PAGE_VERIFY CHECKSUM  
GO
ALTER DATABASE [TestDbSlowInMemory] SET FILESTREAM( NON_TRANSACTED_ACCESS = OFF ) 
GO
ALTER DATABASE [TestDbSlowInMemory] SET TARGET_RECOVERY_TIME = 60 SECONDS 
GO
ALTER DATABASE [TestDbSlowInMemory] SET DELAYED_DURABILITY = DISABLED 
GO
ALTER DATABASE [TestDbSlowInMemory] SET FILESTREAM( NON_TRANSACTED_ACCESS = OFF ) 
GO
ALTER DATABASE [TestDbSlowInMemory] SET TARGET_RECOVERY_TIME = 60 SECONDS 
GO
ALTER DATABASE [TestDbSlowInMemory] SET DELAYED_DURABILITY = DISABLED 
GO
ALTER DATABASE [TestDbSlowInMemory] SET ACCELERATED_DATABASE_RECOVERY = OFF  
GO
ALTER DATABASE [TestDbSlowInMemory] SET QUERY_STORE = ON
GO
ALTER DATABASE [TestDbSlowInMemory] SET QUERY_STORE (OPERATION_MODE = READ_WRITE, CLEANUP_POLICY = (STALE_QUERY_THRESHOLD_DAYS = 30), DATA_FLUSH_INTERVAL_SECONDS = 900, INTERVAL_LENGTH_MINUTES = 60, MAX_STORAGE_SIZE_MB = 100, QUERY_CAPTURE_MODE = ALL, SIZE_BASED_CLEANUP_MODE = AUTO, MAX_PLANS_PER_QUERY = 200, WAIT_STATS_CAPTURE_MODE = ON)
GO
USE [TestDbSlowInMemory]
GO
ALTER DATABASE SCOPED CONFIGURATION SET QUERY_OPTIMIZER_HOTFIXES = ON;
GO
ALTER DATABASE [TestDbSlowInMemory] SET READ_WRITE 
GO
Run Code Online (Sandbox Code Playgroud)

小智 1

我需要了解更多,但这些峰值有可能是垃圾收集吗?服务器需要丢弃不再活动的行版本。我们在一些大表上进行大量大型更新(10K 更新,每分钟每次更新 20K 行),并且偶尔会出现 CPU 峰值,我们将其归因于 GC。