如何使用xpath表达式在PostgreSQL中的XML列上创建索引？

Question

如何使用xpath表达式在PostgreSQL中的XML列上创建索引？

Kho*_*rak 8 xml postgresql indexing xpath aws-aurora

尝试在AuroraDB上使用xpath表达式的XML数据类型列上创建btree索引时遇到此错误 - PostgreSQL 9.6:

ERROR:  could not identify a comparison function for type xml
SQL state: 42883

Run Code Online (Sandbox Code Playgroud)

没有明确解决方案的这个2009线程是我发现讨论此错误消息的唯一一个关于为更早版本的PostgreSQL创建基于xpath的索引:https: //www.postgresql-archive.org/Slow-select -times上选,用,XPath的td2074839.html

在我的情况下,我也需要指定名称空间,并且该线程中的原始海报将xpath表达式的结果转换为text [],这对我来说也是错误的 - 但为什么甚至需要呢？我也没有看到PostgreSQL使用我的索引,即使我有数千行要经过.

所以我尝试了一个更简单的案例,但错误仍然存在 - 请详细说明为什么你可以:

CREATE TABLE test
(
    id integer NOT NULL,
    xml_data xml NOT NULL,
    CONSTRAINT test_pkey PRIMARY KEY (id)
)
WITH (
    OIDS = FALSE
)
TABLESPACE pg_default;



CREATE INDEX test_idx
    ON test USING btree 
    (xpath('/book/title', xml_data))

Run Code Online (Sandbox Code Playgroud)

结果消息是:

ERROR:  could not identify a comparison function for type xml
SQL state: 42883

Run Code Online (Sandbox Code Playgroud)

数据库编码是UTF8.排序规则和字符类型为en_US.UTF-8.

一些示例插入语句也是:

insert into source_data.test(id, xml_data) 
values(1, XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>1</chapter><chapter>2</chapter></book>'))

insert into source_data.test(id, xml_data) 
values(2, XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Apropos</title><chapter>1</chapter><chapter>2</chapter></book>'))

Run Code Online (Sandbox Code Playgroud)

Answer 1

Mar*_*kus 5

由于XML数据类型不提供任何比较运算符而导致您收到此错误，因此您无法基于的结果创建索引xpath()，因为它返回XML值数组。

因此，在创建索引时，需要将XPath表达式强制转换为文本数组：

CREATE INDEX test_idx
ON test USING BTREE 
    (cast(xpath('/book/title', xml_data) as text[])) ;

Run Code Online (Sandbox Code Playgroud)

然后在查询表时使用此索引：

EXPLAIN ANALYZE
SELECT * FROM test where
cast(xpath('/book/title', xml_data) as text[]) = '{<title>Apropos</title>}';

Run Code Online (Sandbox Code Playgroud)

给

                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
Index Scan using test_idx on test  (cost=0.13..8.15 rows=1 width=36) (actual time=0.034..0.038 rows=1 loops=1)
    Index Cond: ((xpath('/book/title'::text, xml_data, '{}'::text[]))::text[] = '{<title>Apropos</title>}'::text[])
Planning time: 0.168 ms
Execution time: 0.073 ms (4 rows)

Run Code Online (Sandbox Code Playgroud)

使用时的工作原理相同text()：

CREATE INDEX test_idx
ON test USING BTREE 
    (cast(xpath('/book/title/text()', xml_data) as text[])) ;

explain analyze select * from test where
cast(xpath('/book/title/text()', xml_data) as text[]) = '{Apropos}';

Run Code Online (Sandbox Code Playgroud)

给

                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Index Scan using test_idx on test  (cost=0.13..8.15 rows=1 width=36) (actual time=0.034..0.038 rows=1 loops=1)
   Index Cond: ((xpath('/book/title/text()'::text, xml_data, '{}'::text[]))::text[] = '{Apropos}'::text[])
 Planning time: 0.166 ms
 Execution time: 0.076 ms
(4 rows)

Run Code Online (Sandbox Code Playgroud)

请注意，由于创建的测试表中只有4行，因此我通过以下命令强制使用了索引。

SET enable_seqscan TO off;

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，5 月前
查看次数：	810 次
最近记录：	7 年，5 月前