在具有许多非常相似的起始值的 VARCHAR 列上建立索引是否会导致性能不佳

uri*_*ium 6 index sql-server-2005 sql-server-2008 sql-server sql-server-2008-r2

我们似乎在使用索引的查询上有非常不寻常的糟糕性能。例如表格看起来像

  • PK BIGINT
  • ID VARCHAR(50)
  • 第 1 列
  • Col2
  • 等等

所以我们需要在数据库中插入一行,然后在 ID 上查找。但是第三方拥有的 ID 和我们有 PK。我们需要拿回PK。但是这些 ID 中很大一部分具有非常相似的起始值。例如

  • "//45-423484834893457"
  • "//45-573459834589345"
  • "//45-345345345345345

我不确定 SQL Server 是如何遍历 BTree 的,如果它从最左边的位置开始对值进行哈希处理或进行字符串比较。

在查询这些值时,具有非常大范围的非常相似的值(至少前 4 个字符是相同的)是否会导致索引性能不佳?

更新:

抱歉,查找查询是

SELECT PK_Column FROM table WHERE ID = @ID
Run Code Online (Sandbox Code Playgroud)

标记要求:

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 22 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 30 ms.

(1 row(s) affected)
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

(1 row(s) affected)
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.1" Build="10.0.4000.0" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
  <BatchSequence>
    <Batch>
      <Statements>
        <StmtSimple StatementCompId="1" StatementEstRows="1" StatementId="1" StatementOptmLevel="TRIVIAL" StatementSubTreeCost="0.0032831" StatementText="SELECT&#xD;&#xA;      LocalMsgId&#xD;&#xA;    FROM&#xD;&#xA;      Pdu (nolock)&#xD;&#xA;  WHERE&#xD;&#xA;     RemoteMsgId = '41/00/2789aeb8/1127796335811'&#xD;&#xA;      &#xD;" StatementType="SELECT" ParameterizedText="(@1 varchar(8000))SELECT [LocalMsgId] FROM [Pdu](nolock) WHERE [RemoteMsgId]=@1" QueryHash="0x677C78E75E33C4C7" QueryPlanHash="0xB358D862A43E4853">
          <StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
          <QueryPlan CachedPlanSize="16" CompileTime="7406" CompileCPU="1970" CompileMemory="120">
            <RelOp AvgRowSize="23" EstimateCPU="0.0001581" EstimateIO="0.003125" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="1" LogicalOp="Index Seek" NodeId="0" Parallel="false" PhysicalOp="Index Seek" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="5074270">
              <OutputList>
                <ColumnReference Database="[smpp]" Schema="[dbo]" Table="[Pdu]" Column="LocalMsgId" />
              </OutputList>
              <IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" NoExpandHint="false">
                <DefinedValues>
                  <DefinedValue>
                    <ColumnReference Database="[smpp]" Schema="[dbo]" Table="[Pdu]" Column="LocalMsgId" />
                  </DefinedValue>
                </DefinedValues>
                <Object Database="[smpp]" Schema="[dbo]" Table="[Pdu]" Index="[IX_Pdu_RemoteMsgId]" IndexKind="NonClustered" />
                <SeekPredicates>
                  <SeekPredicateNew>
                    <SeekKeys>
                      <Prefix ScanType="EQ">
                        <RangeColumns>
                          <ColumnReference Database="[smpp]" Schema="[dbo]" Table="[Pdu]" Column="RemoteMsgId" />
                        </RangeColumns>
                        <RangeExpressions>
                          <ScalarOperator ScalarString="[@1]">
                            <Identifier>
                              <ColumnReference Column="@1" />
                            </Identifier>
                          </ScalarOperator>
                        </RangeExpressions>
                      </Prefix>
                    </SeekKeys>
                  </SeekPredicateNew>
                </SeekPredicates>
              </IndexScan>
            </RelOp>
            <ParameterList>
              <ColumnReference Column="@1" ParameterCompiledValue="'41/00/2789aeb8/1127796335811'" />
            </ParameterList>
          </QueryPlan>
        </StmtSimple>
      </Statements>
      <Statements>
        <StmtSimple StatementCompId="2" StatementId="2" StatementText="&#xA;SET STATISTICS IO OFF&#xD;&#xA;" StatementType="SET STATS" />
      </Statements>
    </Batch>
    <Batch>
      <Statements>
        <StmtSimple StatementCompId="1" StatementId="1" StatementText="SET STATISTICS TIME OFF&#xD;&#xA;" StatementType="SET STATS" />
      </Statements>
    </Batch>
  </BatchSequence>
</ShowPlanXML>
Run Code Online (Sandbox Code Playgroud)

小智 2

这取决于您使用的查询。MS SQL Server 使用始终平衡的 BTree 索引,但如果您使用如下查询:

select * from table where field like 'some%'
Run Code Online (Sandbox Code Playgroud)

并且您的大多数记录都符合这种情况,MS SQL Server 可以决定使用表扫描而不是索引扫描或索引查找会更便宜。

添加: 无论如何,您可以使用计算列来反转您的字段值并在其上创建索引