相关疑难解决方法(0)

存储记录元数据的最佳实践

在数据库中存储单个记录的元数据的最佳实践是什么？

我需要在我的数据库中存储许多表的常见元数据，例如创建时间和上次更新时间。我找到了几种不同的解决方案：

将元数据直接存储在表中。

优点：
- 元数据直接链接到记录
- 无需连接即可检索元数据
缺点：
- 需要大量重复列（除非使用继承）
- 元数据和业务数据不分离
创建一个通用元数据表，并使用软外键将数据链接到正确的表和记录。

优点：
- 没有重复的列
- 元数据与业务数据分离
缺点：
- 元数据和数据之间没有直接链接（不能使用 FK）
- 联接需要附加条件
为每个需要元数据的表创建单独的元数据表。

优点：
- 元数据直接链接到记录
- 元数据与业务数据分离
缺点：
- 需要很多额外的表
- 需要大量重复列（除非使用继承）

是否有比我在这里提到的更多的选择、优点或缺点？存储这些元数据的最佳实践是什么？

postgresql database-design metadata

Tid*_*ddo

2013 06-14

13
推荐指数

1
解决办法

5780
查看次数

大表中完全空的列如何影响性能？

我在 Postgres 数据库中有 4 亿行，表有 18 列：

id serial NOT NULL,
a integer,
b integer,
c integer,
d smallint,
e timestamp without time zone,
f smallint,
g timestamp without time zone,
h integer,
i timestamp without time zone,
j integer,
k character varying(32),
l integer,
m smallint,
n smallint,
o character varying(36),
p character varying(100),
q character varying(100)

Run Code Online (Sandbox Code Playgroud)

列e、k和n都是 NULL，它们根本不存储任何值，此时完全没用。它们是原始设计的一部分，但从未被移除。

编辑 - 大多数其他列都是非 NULL。

问题：

如何计算这对存储的影响？它是否等于列的大小 * 行数？
删除这些空列会显着提高该表的性能吗？页面缓存能够容纳更多行吗？

postgresql performance database-design storage disk-space postgresql-performance

ebi*_*ebi

2020 01-08

13
推荐指数

1
解决办法

4824
查看次数

如何在 Postgres 中获取窗口函数的聚合？

我有一个表，其中包含两列整型数组的排列/组合，第三列包含一个值，如下所示：

CREATE TABLE foo
(
  perm integer[] NOT NULL,
  combo integer[] NOT NULL,
  value numeric NOT NULL DEFAULT 0
);
INSERT INTO foo
VALUES
( '{3,1,2}', '{1,2,3}', '1.1400' ),
( '{3,1,2}', '{1,2,3}', '0' ),
( '{3,1,2}', '{1,2,3}', '1.2680' ),
( '{3,1,2}', '{1,2,3}', '0' ),
( '{3,1,2}', '{1,2,3}', '1.2680' ),
( '{3,1,2}', '{1,2,3}', '0' ),
( '{3,1,2}', '{1,2,3}', '0' ),
( '{3,1,2}', '{1,2,3}', '1.2680' ),
( '{3,1,2}', '{1,2,3}', '0.9280' ),
( '{3,1,2}', '{1,2,3}', '0' ),
( '{3,1,2}', '{1,2,3}', '1.2680' ),
( …

Run Code Online (Sandbox Code Playgroud)

postgresql aggregate window-functions

Sco*_*all

lucky-day

11
推荐指数

1
解决办法

1万
查看次数

优化一系列时间戳的查询（一列）

我正在通过 Heroku 使用 Postgres 9.3。

我有一个表，“交通”，有 100 万条记录，每天都有很多插入和更新。我需要在不同的时间范围内跨该表执行 SUM 运算，这些调用最多可能需要 40 秒，我很想听听有关如何改进它的建议。

我在这张桌子上有以下索引：

CREATE INDEX idx_traffic_partner_only ON traffic (dt_created) WHERE campaign_id IS NULL AND uuid_self <> uuid_partner;

Run Code Online (Sandbox Code Playgroud)

这是一个示例 SELECT 语句：

SELECT SUM("clicks") AS clicks, SUM("impressions") AS impressions
FROM "traffic"
WHERE "uuid_self" != "uuid_partner"
AND "campaign_id" is NULL
AND "dt_created" >= 'Sun, 29 Mar 2015 00:00:00 +0000'
AND "dt_created" <= 'Mon, 27 Apr 2015 23:59:59 +0000'

Run Code Online (Sandbox Code Playgroud)

这是解释分析：

Aggregate  (cost=21625.91..21625.92 rows=1 width=16) (actual time=41804.754..41804.754 rows=1 loops=1)
  ->  Index Scan using idx_traffic_partner_only on …

Run Code Online (Sandbox Code Playgroud)

postgresql performance index optimization postgresql-9.3 postgresql-performance

Eva*_*eby

2020 01-08

9
推荐指数

1
解决办法

5849
查看次数

固定宽度行会提高 PostgreSQL 读取性能吗？

我有一张桌子articles：

                                                       Table "articles"
     Column     |            Type             |                     Modifiers                      | Storage  | Stats target | Description
----------------+-----------------------------+----------------------------------------------------+----------+--------------+-------------
 id             | integer                     | not null default nextval('articles_id_seq'::regclass) | plain    |              |
 user_id        | integer                     |                                                    | plain    |              |
 title          | character varying(255)      |                                                    | extended |              |
 author         | character varying(255)      |                                                    | extended |              |
 body           | text                        | default '--- []                                   +| extended |              |
                |                             | '::text                                            |          |              |
 created_at     | timestamp without time zone | …

Run Code Online (Sandbox Code Playgroud)

postgresql performance datatypes postgresql-9.4 query-performance

bob*_*opy

2020 06-15

9
推荐指数

1
解决办法

1694
查看次数

将几个布尔值存储为数组是否有意义？

我有一个包含五个布尔列的表。在 90% 以上的行中，所有列都为空。（False相当于null我。）

我可以有一个包含枚举自定义数据类型数组的单个数组列，而不是具有布尔列，从而仅存储非空的列。

我觉得使用数组很奇怪，但我的同事向我指出，并没有真正强烈的理由反对使用它们，而且我们实际上可能会看到使用它们的节省，因为我们没有存储一堆空列。

使用数组有什么缺点吗？具体来说：它们会占用更多空间，占用更多时间进行查询，还是阻止使用 Postgres 功能（例如 gin 索引）？

postgresql performance database-design datatypes array

Xod*_*rap

2016 07-14

9
推荐指数

1
解决办法

3606
查看次数

存储数百万行非标准化数据或一些 SQL 魔法？

我的 DBA 经验只是简单的存储 + CMS 样式数据的检索 - 所以这可能是一个愚蠢的问题，我不知道！

我有一个问题，我需要查找或计算特定组大小和特定时间段内特定天数的假期价格。例如：

1 月任何时候 2 人 4 晚的酒店房间多少钱？

例如，我有 5000 家酒店的定价和可用性数据，如下所示：

Hotel ID | Date | Spaces | Price PP
-----------------------------------
     123 | Jan1 | 5      | 100
     123 | Jan2 | 7      | 100
     123 | Jan3 | 5      | 100
     123 | Jan4 | 3      | 100
     123 | Jan5 | 5      | 100
     123 | Jan6 | 7      | 110
     456 | Jan1 | 5      | 120
     456 | Jan2 …

Run Code Online (Sandbox Code Playgroud)

postgresql window-functions datetime denormalization

Guy*_*den

2014 09-02

8
推荐指数

1
解决办法

769
查看次数

加速创建 Postgres 部分索引

我正在尝试为 Postgres 9.4 中的大型（1.2TB）静态表创建部分索引。

我的数据是完全静态的，所以我可以插入所有数据，然后创建所有索引。

在这个 1.2TB 的表中，我有一个名为的列run_id，可以清晰地划分数据。通过创建涵盖一系列run_ids 的索引，我们获得了出色的性能。下面是一个例子：

CREATE INDEX perception_run_frame_idx_run_266_thru_270
ON run.perception
(run_id, frame)
WHERE run_id >= 266 AND run_id <= 270;

Run Code Online (Sandbox Code Playgroud)

这些部分索引为我们提供了所需的查询速度。不幸的是，每个部分索引的创建大约需要 70 分钟。

看起来我们的 CPU 有限（top进程显示为 100%）。
我可以做些什么来加快部分索引的创建？

系统规格：

18核至强
192GB 内存
RAID 中的 12 个 SSD
自动吸尘器关闭
维护工作内存：64GB（太高？）

表规格：

大小：1.26 TB
行数：105.37亿
典型的索引大小：3.2GB（有 ~.5GB 的差异）

表定义：

CREATE TABLE run.perception(
id bigint NOT NULL,
run_id bigint NOT NULL,
frame bigint NOT NULL,
by character varying(45) NOT NULL,
by_anyone bigint NOT …

Run Code Online (Sandbox Code Playgroud)

postgresql performance index ddl performance-tuning postgresql-performance

bur*_*nsy

2020 01-08

8
推荐指数

1
解决办法

3546
查看次数

过滤数组 text[] 并按时间戳排序

描述

Linux 上的 PostgreSQL 9.6，tags_tmp表大小~ 30 GB（1000 万行），tags是一个text[]并且只有 6 个值。

tags_tmp(id int, tags text[], maker_date timestamp, value text)

Run Code Online (Sandbox Code Playgroud)

tags_tmp(id int, tags text[], maker_date timestamp, value text)

Run Code Online (Sandbox Code Playgroud)

我需要使用 filter ontags和order byon检索数据maker_date desc。我可以在两tags & maker_date desc列上创建索引吗？

如果没有，你能提出其他想法吗？

查询示例

select id, tags, maker_date, value
from tags_tmp
where  tags && array['a','b']
order by maker_date desc
limit 5 offset 0

Run Code Online (Sandbox Code Playgroud)

SQL 代码：

create index idx1 on tags_tmp using gin (tags);
create …

Run Code Online (Sandbox Code Playgroud)

postgresql performance order-by index-tuning postgresql-performance

Lua*_*ynh

2020 01-08

8
推荐指数

1
解决办法

6302
查看次数

管理和加速对超过 3 万亿行的 PostgreSQL 表的查询

我有超过 10 年的时间序列数据，有超过 3 万亿行和 10 列。

目前我使用具有 128GB RAM 的 PCIe SSD，我发现查询需要大量时间。例如，运行以下命令需要超过 15 分钟：

SELECT * FROM tbl WHERE column_a = 'value1' AND column_b = 'value2';

Run Code Online (Sandbox Code Playgroud)

该表主要用于读取。写入表的唯一时间是在每周更新期间插入大约 1500 万行。

管理如此大的表的最佳方法是什么？您会建议按年份拆分吗？

表大小为 542 GB，外部大小为 109 GB。

EXPLAIN (BUFFERS, ANALYZE) 输出：

"Seq Scan on table  (cost=0.00..116820941.44 rows=758 width=92) (actual time=0.011..1100643.844 rows=667 loops=1)"
"  Filter: (("COLUMN_A" = 'Value1'::text) AND ("COLUMN_B" = 'Value2'::text))"
"  Rows Removed by Filter: 4121893840"
"  Buffers: shared hit=2 read=56640470 dirtied=476248 written=476216"
"Total runtime: 1100643.967 ms"

Run Code Online (Sandbox Code Playgroud)

该表是使用以下代码创建的：