Postgres 中空间查询的 3d 点数据的良好布局？

Question

Postgres 中空间查询的 3d 点数据的良好布局？

tom*_*mka 5 postgresql datatypes spatial composite-types

就像另一个问题所示，我在 3D 空间中处理了很多（> 10,000,000）个点条目。这些点定义如下：

CREATE TYPE float3d AS (
  x real,
  y real,
  z real);

Run Code Online (Sandbox Code Playgroud)

如果我没记错的话，需要 3*8 字节 + 8 字节填充（MAXALIGN是 8）来存储这些点之一。有没有更好的方法来存储这种数据？在前面提到的问题中，有人指出复合类型涉及相当多的开销。

我经常做这样的空间查询：

  SELECT t1.id, t1.parent_id, (t1.location).x, (t1.location).y, (t1.location).z,
         t1.confidence, t1.radius, t1.skeleton_id, t1.user_id,
         t2.id, t2.parent_id, (t2.location).x, (t2.location).y, (t2.location).z,
         t2.confidence, t2.radius, t2.skeleton_id, t2.user_id
  FROM treenode t1
       INNER JOIN treenode t2 ON
         (   (t1.id = t2.parent_id OR t1.parent_id = t2.id)
          OR (t1.parent_id IS NULL AND t1.id = t2.id))
        WHERE (t1.LOCATION).z = 41000.0
          AND (t1.LOCATION).x > 2822.6
          AND (t1.LOCATION).x < 62680.2
          AND (t1.LOCATION).y > 33629.8
          AND (t1.LOCATION).y < 65458.6
          AND t1.project_id = 1 LIMIT 5000;

Run Code Online (Sandbox Code Playgroud)

像这样的查询大约需要 160 毫秒，但我想知道这是否可以减少。

这是结构用于的表格布局：

    Column     |           Type           |                       Modifiers                    
---------------+--------------------------+-------------------------------------------------------
 id            | bigint                   | not null default nextval('location_id_seq'::regclass)
 user_id       | integer                  | not null
 creation_time | timestamp with time zone | not null default now()
 edition_time  | timestamp with time zone | not null default now()
 project_id    | integer                  | not null
 location      | float3d                  | not null
 editor_id     | integer                  |
 parent_id     | bigint                   |
 radius        | real                     | not null default 0
 confidence    | smallint                 | not null default 5
 skeleton_id   | integer                  | not null

Indexes:
    "treenode_pkey" PRIMARY KEY, btree (id)
    "treenode_parent_id" btree (parent_id)
    "treenode_project_id_location_x_index" btree (project_id, ((location).x))
    "treenode_project_id_location_y_index" btree (project_id, ((location).y))
    "treenode_project_id_location_z_index" btree (project_id, ((location).z))
    "treenode_project_id_skeleton_id_index" btree (project_id, skeleton_id)
    "treenode_project_id_user_id_index" btree (project_id, user_id)
    "treenode_skeleton_id_index" btree (skeleton_id)

Run Code Online (Sandbox Code Playgroud)

Answer 1

Erw*_*ter 3

复合类型是干净的设计，但它对性能没有任何帮助。

首先，float翻译为Postgres 中的float8aka 。double precision你正在建立一个误解。
数据real类型占用4个字节（不是8个字节）。它必须以 4 字节的倍数对齐。

用测量实际尺寸pg_column_size()。

SQL Fiddle演示实际大小。

复合类型real3d占用36字节。那是：

23 byte tuple header
1 byte padding
4 bytes real x
4 bytes real y
4 bytes real z
---
36 bytes

Run Code Online (Sandbox Code Playgroud)

如果将其嵌入到表格中，则可能需要添加填充。另一方面，磁盘上的类型标头可以小 3 个字节。磁盘上的表示通常比 RAM 中的表示要小一些。没有太大区别。

更多的：

桌子布局

使用此等效设计可大幅减少行大小：

    Column     |           Type           |                       Modifiers
---------------+--------------------------+---------------------------------
 id            | bigint                   | not null default nextval(...
 creation_time | timestamp with time zone | not null default now()
 edition_time  | timestamp with time zone | not null default now()
 user_id       | integer                  | not null
 project_id    | integer                  | not null
 location_x    | real                     | not null
 location_y    | real                     | not null
 location_z    | real                     | not null
 radius        | real                     | not null default 0
 skeleton_id   | integer                  | not null
 confidence    | smallint                 | not null default 5
 parent_id     | bigint                   |
 editor_id     | integer                  |

Run Code Online (Sandbox Code Playgroud)

前后测试以验证我的说法：

SELECT pg_relation_size('treenode') As table_size;

SELECT avg(pg_column_size(t) AS avg_row_size
FROM   treenode t;

Run Code Online (Sandbox Code Playgroud)

更多细节：

测量 PostgreSQL 表行的大小

归档时间：	11 年，1 月前
查看次数：	2284 次
最近记录：	7 年，8 月前