Ignite SQL 查询需要时间

Question

Ignite SQL 查询需要时间

我们目前使用的是 GridGain 社区版 8.8.10。我们使用 Ignite 操作符在 Kubernetes 中设置了 Ignite 集群。该集群由 2 个启用了本机持久性的节点组成，我们使用胖客户端连接到 Ignite 集群。客户端也部署在同一个 Kubernetes 集群中。Cluster的内存配置如下：

-DIGNITE_WAL_MMAP=false  -DIGNITE_QUIET=false -Xms6g -Xmx6g -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC


<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
    <property name="name" value="Knowledge_Region"/>
    <!-- Memory region of 20 MB initial size. -->
    <property name="initialSize" value="#{20 * 1024 * 1024}"/>
    <!-- Maximum size is 9 GB  -->
    <property name="maxSize" value="#{9L * 1024 * 1024 * 1024}"/>
    <!-- Enabling eviction for this memory region. -->
    <property name="pageEvictionMode" value="RANDOM_2_LRU"/>
    <property name="persistenceEnabled" value="true"/>
    <!-- Enabling SEGMENTED_LRU page replacement for this region.  -->
    <property name="pageReplacementMode" value="SEGMENTED_LRU"/>
</bean>

Run Code Online (Sandbox Code Playgroud)

我们使用 Ignite String 函数来查询缓存。Cache结构如下：

  @QuerySqlField(index = true, inlineSize = 100)
  private String value;

  @QuerySqlField(name = "label", index = true, inlineSize = 100)
  private String label;

  @QuerySqlField(name = "type", index = true, inlineSize = 100)
  @AffinityKeyMapped
  private String type;

  private String typeLabel;
  private List<String> synonyms;

Run Code Online (Sandbox Code Playgroud)

我们用来获取数据的 SQL 查询如下：

select _key, _val from TESTCACHEVALUE USE INDEX(TESTCACHEVALUE_label_IDX) WHERE REGEXP_LIKE(label, 'unit.*s.*','i') LIMIT 8

Run Code Online (Sandbox Code Playgroud)

正在生成的查询计划：

[05:04:56,613][WARNING][long-qry-#36][LongRunningQueryManager] Query execution is too long [duration=1124ms, type=MAP, distributedJoin=false, enforceJoinOrder=false, lazy=false, schema=staging_infrastructuretesting_business_object, sql='SELECT
"__Z0"."_KEY" AS "__C0_0",
"__Z0"."_VAL" AS "__C0_1"
FROM "staging_infrastructuretesting_business_object"."TESTCACHEVALUE" AS "__Z0" USE INDEX ("TESTCACHEVALUE_LABEL_IDX")
WHERE REGEXP_LIKE("__Z0"."LABEL", 'uni.*', 'i') FETCH FIRST 8 ROWS ONLY', plan=SELECT
    __Z0._KEY AS __C0_0,
    __Z0._VAL AS __C0_1
FROM staging_infrastructuretesting_business_object.TESTCACHEVALUE __Z0 USE INDEX (TESTCACHEVALUE_LABEL_IDX)
    /* staging_infrastructuretesting_business_object.TESTCACHEVALUE.__SCAN_ */
    /* scanCount: 289643 */
    /* lookupCount: 1 */
WHERE REGEXP_LIKE(__Z0.LABEL, 'uni.*', 'i')
FETCH FIRST 8 ROWS ONLY

Run Code Online (Sandbox Code Playgroud)

正如我所看到的，查询将进行全面扫描，而不是使用查询中指定的索引。

缓存包含 500 万个对象。

Cluster的内存统计如下：

    ^-- Node [id=d87d1212, uptime=00:30:00.229]
    ^-- Cluster [hosts=6, CPUs=20, servers=2, clients=4, topVer=12, minorTopVer=25]
    ^-- Network [addrs=[10.57.5.10, 127.0.0.1], discoPort=47500, commPort=47100]
    ^-- CPU [CPUs=1, curLoad=16%, avgLoad=38.3%, GC=0%]
    ^-- Heap [used=4265MB, free=30.58%, comm=6144MB]
    ^-- Off-heap memory [used=4872MB, free=58.58%, allocated=11564MB]
    ^-- Page memory [pages=620072]
    ^--   sysMemPlc region [type=internal, persistence=true, lazyAlloc=false,
      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.96%, allocRam=100MB, allocTotal=0MB]
    ^--   metastoreMemPlc region [type=internal, persistence=true, lazyAlloc=false,
      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.87%, allocRam=0MB, allocTotal=0MB]
    ^--   TxLog region [type=internal, persistence=true, lazyAlloc=false,
      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=100MB, allocTotal=0MB]
    ^--   volatileDsMemPlc region [type=internal, persistence=false, lazyAlloc=true,
      ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=0MB]
    ^--   Default_Region region [type=default, persistence=true, lazyAlloc=true,
      ...  initCfg=20MB, maxCfg=9216MB, usedRam=4781MB, freeRam=48.12%, allocRam=9216MB, allocTotal=4753MB]
    ^-- Ignite persistence [used=4844MB]
    ^-- Outbound messages queue [size=0]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=8, qSize=0]
    ^-- Striped thread pool [active=0, idle=8, qSize=0]

Run Code Online (Sandbox Code Playgroud)

从内存快照来看，集群中似乎有足够的内存。

到目前为止我已经尝试过的。

查询中的索引提示
对查询应用限制
具有查询并行性的分区缓存 3
SkipReducer 更新 True
OnheapCacheEnabled 设置为 True

不确定为什么查询需要时间。如果我错过了什么，请告诉我。

从查询执行计划观察，所用时间约为 2 秒，但客户端在 5 秒内得到响应。

提前致谢。

Answer 1

小智 2

您似乎忽略了 Apache Ignite SQL 引擎在内部利用B+Tree数据结构的事实。B+Tree 依赖于存储对象的某种“顺序”（应该有一种方法来比较它们）。可以用此结构处理的文本搜索的唯一情况是前缀搜索，因为它为搜索算法建立了分支条件。这是示例：

select _key, _val from TESTCACHEVALUE WHERE label LIKE 'unit%'

Run Code Online (Sandbox Code Playgroud)

在这种情况下，TESTCACHEVALUE_label_IDX即使没有提示，您也会看到正在使用索引。

对于您的场景来说，REGEXP_LIKE这只是一个逐一应用的Matcher.find()迭代label。

尝试 Ignite Text Query机制。它基于 Apache Lucene，看起来更适合这种情况。

归档时间：	4 年，2 月前
查看次数：	536 次
最近记录：	4 年，1 月前