tom*_*mka 5 postgresql datatypes spatial composite-types
就像另一个问题所示,我在 3D 空间中处理了很多(> 10,000,000)个点条目。这些点定义如下:
CREATE TYPE float3d AS (
x real,
y real,
z real);
Run Code Online (Sandbox Code Playgroud)
如果我没记错的话,需要 3*8 字节 + 8 字节填充(MAXALIGN
是 8)来存储这些点之一。有没有更好的方法来存储这种数据?在前面提到的问题中,有人指出复合类型涉及相当多的开销。
我经常做这样的空间查询:
SELECT t1.id, t1.parent_id, (t1.location).x, (t1.location).y, (t1.location).z,
t1.confidence, t1.radius, t1.skeleton_id, t1.user_id,
t2.id, t2.parent_id, (t2.location).x, (t2.location).y, (t2.location).z,
t2.confidence, t2.radius, t2.skeleton_id, t2.user_id
FROM treenode t1
INNER JOIN treenode t2 ON
( (t1.id = t2.parent_id OR t1.parent_id = t2.id)
OR (t1.parent_id IS NULL AND t1.id = t2.id))
WHERE (t1.LOCATION).z = 41000.0
AND (t1.LOCATION).x > 2822.6
AND (t1.LOCATION).x < 62680.2
AND (t1.LOCATION).y > 33629.8
AND (t1.LOCATION).y < 65458.6
AND t1.project_id = 1 LIMIT 5000;
Run Code Online (Sandbox Code Playgroud)
像这样的查询大约需要 160 毫秒,但我想知道这是否可以减少。
这是结构用于的表格布局:
Column | Type | Modifiers
---------------+--------------------------+-------------------------------------------------------
id | bigint | not null default nextval('location_id_seq'::regclass)
user_id | integer | not null
creation_time | timestamp with time zone | not null default now()
edition_time | timestamp with time zone | not null default now()
project_id | integer | not null
location | float3d | not null
editor_id | integer |
parent_id | bigint |
radius | real | not null default 0
confidence | smallint | not null default 5
skeleton_id | integer | not null
Indexes:
"treenode_pkey" PRIMARY KEY, btree (id)
"treenode_parent_id" btree (parent_id)
"treenode_project_id_location_x_index" btree (project_id, ((location).x))
"treenode_project_id_location_y_index" btree (project_id, ((location).y))
"treenode_project_id_location_z_index" btree (project_id, ((location).z))
"treenode_project_id_skeleton_id_index" btree (project_id, skeleton_id)
"treenode_project_id_user_id_index" btree (project_id, user_id)
"treenode_skeleton_id_index" btree (skeleton_id)
Run Code Online (Sandbox Code Playgroud)
复合类型是干净的设计,但它对性能没有任何帮助。
首先,float
翻译为Postgres 中的float8
aka 。double precision
你正在建立一个误解。
数据real
类型占用4个字节(不是8个字节)。它必须以 4 字节的倍数对齐。
用 测量实际尺寸pg_column_size()
。
SQL Fiddle演示实际大小。
复合类型real3d
占用36字节。那是:
23 byte tuple header
1 byte padding
4 bytes real x
4 bytes real y
4 bytes real z
---
36 bytes
Run Code Online (Sandbox Code Playgroud)
如果将其嵌入到表格中,则可能需要添加填充。另一方面,磁盘上的类型标头可以小 3 个字节。磁盘上的表示通常比 RAM 中的表示要小一些。没有太大区别。
更多的:
使用此等效设计可大幅减少行大小:
Column | Type | Modifiers
---------------+--------------------------+---------------------------------
id | bigint | not null default nextval(...
creation_time | timestamp with time zone | not null default now()
edition_time | timestamp with time zone | not null default now()
user_id | integer | not null
project_id | integer | not null
location_x | real | not null
location_y | real | not null
location_z | real | not null
radius | real | not null default 0
skeleton_id | integer | not null
confidence | smallint | not null default 5
parent_id | bigint |
editor_id | integer |
Run Code Online (Sandbox Code Playgroud)
前后测试以验证我的说法:
SELECT pg_relation_size('treenode') As table_size;
SELECT avg(pg_column_size(t) AS avg_row_size
FROM treenode t;
Run Code Online (Sandbox Code Playgroud)
更多细节: