单独的月份和年份列，或日期与日期始终设置为 1？

Dav*_*ton 17 postgresql database-design datetime

我正在用 Postgres 构建一个数据库，其中将有很多由month和分组的东西year，但永远不会由date.

我可以创建整数month和year列并使用它们。
或者我可以有一month_year列并始终将其设置day为 1。

如果有人正在查看数据，前者似乎更简单更清晰，但后者很好，因为它使用了正确的类型。

就个人而言，如果它是一个日期，或者可以是一个日期，我建议始终将其存储为一个。根据经验，它更容易使用。

一个日期是 4 个字节。
smallint 是 2 个字节（我们需要两个）
- ... 2 个字节：一年的 smallint
- ... 2 个字节：一个月的一个 smallint

如果您需要，您可以拥有一个支持日的日期，或者一个smallint永远不会支持额外精度的年和月。

样本数据

现在让我们看一个例子。让我们为我们的样本创建 100 万个日期。这是 1901 年到 2100 年之间 200 年的大约 5,000 行。每年每个月都应该有一些东西。

CREATE TABLE foo
AS
  SELECT
    x,
    make_date(year,month,1)::date AS date,
    year::smallint,
    month::smallint
  FROM generate_series(1,1e6) AS gs(x)
  CROSS JOIN LATERAL CAST(trunc(random()*12+1+x-x) AS int) AS month
  CROSS JOIN LATERAL CAST(trunc(random()*200+1901+x-x) AS int) AS year
;
CREATE INDEX ON foo(date);
CREATE INDEX ON foo (year,month);
VACUUM FULL ANALYZE foo;

Run Code Online (Sandbox Code Playgroud)

测试

简单的 `WHERE`

现在我们可以测试这些不使用日期的理论。我运行了几次以预热。

EXPLAIN ANALYZE SELECT * FROM foo WHERE date = '2014-1-1'
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=11.56..1265.16 rows=405 width=14) (actual time=0.164..0.751 rows=454 loops=1)
   Recheck Cond: (date = '2014-04-01'::date)
   Heap Blocks: exact=439
   ->  Bitmap Index Scan on foo_date_idx  (cost=0.00..11.46 rows=405 width=0) (actual time=0.090..0.090 rows=454 loops=1)
         Index Cond: (date = '2014-04-01'::date)
 Planning time: 0.090 ms
 Execution time: 0.795 ms

Run Code Online (Sandbox Code Playgroud)

现在，让我们尝试另一种方法，将它们分开

EXPLAIN ANALYZE SELECT * FROM foo WHERE year = 2014 AND month = 1;
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=12.75..1312.06 rows=422 width=14) (actual time=0.139..0.707 rows=379 loops=1)
   Recheck Cond: ((year = 2014) AND (month = 1))
   Heap Blocks: exact=362
   ->  Bitmap Index Scan on foo_year_month_idx  (cost=0.00..12.64 rows=422 width=0) (actual time=0.079..0.079 rows=379 loops=1)
         Index Cond: ((year = 2014) AND (month = 1))
 Planning time: 0.086 ms
 Execution time: 0.749 ms
(7 rows)

Run Code Online (Sandbox Code Playgroud)

公平地说，它们并不都是 0.749.. 有些或多或少，但没关系。它们都相对相同。它根本不需要。

一个月内

现在，让我们玩得开心吧.. 假设您要查找 2014 年 1 月的 1 个月（我们上面使用的同一个月）内的所有时间间隔。

EXPLAIN ANALYZE
  SELECT *
  FROM foo
  WHERE date
    BETWEEN
      ('2014-1-1'::date - '1 month'::interval)::date 
      AND ('2014-1-1'::date + '1 month'::interval)::date;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=21.27..2310.97 rows=863 width=14) (actual time=0.384..1.644 rows=1226 loops=1)
   Recheck Cond: ((date >= '2013-12-01'::date) AND (date <= '2014-02-01'::date))
   Heap Blocks: exact=1083
   ->  Bitmap Index Scan on foo_date_idx  (cost=0.00..21.06 rows=863 width=0) (actual time=0.208..0.208 rows=1226 loops=1)
         Index Cond: ((date >= '2013-12-01'::date) AND (date <= '2014-02-01'::date))
 Planning time: 0.104 ms
 Execution time: 1.727 ms
(7 rows)

Run Code Online (Sandbox Code Playgroud)

将其与组合方法进行比较

EXPLAIN ANALYZE
  SELECT *
  FROM foo
  WHERE year = 2013 AND month = 12
    OR ( year = 2014 AND ( month = 1 OR month = 2) );

                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=38.79..2999.66 rows=1203 width=14) (actual time=0.664..2.291 rows=1226 loops=1)
   Recheck Cond: (((year = 2013) AND (month = 12)) OR (((year = 2014) AND (month = 1)) OR ((year = 2014) AND (month = 2))))
   Heap Blocks: exact=1083
   ->  BitmapOr  (cost=38.79..38.79 rows=1237 width=0) (actual time=0.479..0.479 rows=0 loops=1)
         ->  Bitmap Index Scan on foo_year_month_idx  (cost=0.00..12.64 rows=421 width=0) (actual time=0.112..0.112 rows=402 loops=1)
               Index Cond: ((year = 2013) AND (month = 12))
         ->  BitmapOr  (cost=25.60..25.60 rows=816 width=0) (actual time=0.218..0.218 rows=0 loops=1)
               ->  Bitmap Index Scan on foo_year_month_idx  (cost=0.00..12.62 rows=420 width=0) (actual time=0.108..0.108 rows=423 loops=1)
                     Index Cond: ((year = 2014) AND (month = 1))
               ->  Bitmap Index Scan on foo_year_month_idx  (cost=0.00..12.38 rows=395 width=0) (actual time=0.108..0.108 rows=401 loops=1)
                     Index Cond: ((year = 2014) AND (month = 2))
 Planning time: 0.256 ms
 Execution time: 2.421 ms
(13 rows)

Run Code Online (Sandbox Code Playgroud)

它既慢又丑。

`GROUP BY`/`ORDER BY`

组合方法，

EXPLAIN ANALYZE
  SELECT date, count(*)
  FROM foo
  GROUP BY date
  ORDER BY date;
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=20564.75..20570.75 rows=2400 width=4) (actual time=286.749..286.841 rows=2400 loops=1)
   Sort Key: date
   Sort Method: quicksort  Memory: 209kB
   ->  HashAggregate  (cost=20406.00..20430.00 rows=2400 width=4) (actual time=285.978..286.301 rows=2400 loops=1)
         Group Key: date
         ->  Seq Scan on foo  (cost=0.00..15406.00 rows=1000000 width=4) (actual time=0.012..70.582 rows=1000000 loops=1)
 Planning time: 0.094 ms
 Execution time: 286.971 ms
(8 rows)

Run Code Online (Sandbox Code Playgroud)

再次使用复合方法

EXPLAIN ANALYZE
  SELECT year, month, count(*)
  FROM foo
  GROUP BY year, month
  ORDER BY year, month;
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=23064.75..23070.75 rows=2400 width=4) (actual time=336.826..336.908 rows=2400 loops=1)
   Sort Key: year, month
   Sort Method: quicksort  Memory: 209kB
   ->  HashAggregate  (cost=22906.00..22930.00 rows=2400 width=4) (actual time=335.757..336.060 rows=2400 loops=1)
         Group Key: year, month
         ->  Seq Scan on foo  (cost=0.00..15406.00 rows=1000000 width=4) (actual time=0.010..70.468 rows=1000000 loops=1)
 Planning time: 0.098 ms
 Execution time: 337.027 ms
(8 rows)

Run Code Online (Sandbox Code Playgroud)

结论

一般来说，让聪明的人做艰苦的工作。Datemath 很难，我的客户付给我的钱不够。我曾经做过这些测试。我很难得出结论，我可以得到比date. 我停止了尝试。

更新

@a_horse_with_no_name 建议我在一个月内进行测试WHERE (year, month) between (2013, 12) and (2014,2)。在我看来，虽然很酷，但这是一个更复杂的查询，除非有收获，否则我宁愿避免它。唉，虽然它很接近，但它仍然更慢——这更像是这次测试的结果。这根本没有多大关系。

EXPLAIN ANALYZE
  SELECT *
  FROM foo
  WHERE (year, month) between (2013, 12) and (2014,2);

                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=5287.16..15670.20 rows=248852 width=14) (actual time=0.753..2.157 rows=1226 loops=1)
   Recheck Cond: ((ROW(year, month) >= ROW(2013, 12)) AND (ROW(year, month) <= ROW(2014, 2)))
   Heap Blocks: exact=1083
   ->  Bitmap Index Scan on foo_year_month_idx  (cost=0.00..5224.95 rows=248852 width=0) (actual time=0.550..0.550 rows=1226 loops=1)
         Index Cond: ((ROW(year, month) >= ROW(2013, 12)) AND (ROW(year, month) <= ROW(2014, 2)))
 Planning time: 0.099 ms
 Execution time: 2.249 ms
(7 rows)

Run Code Online (Sandbox Code Playgroud)

与其他一些 RDBMS 不同（参见 http://use-the-index-luke.com/blog/2013-07/pagination-done-the-postgresql-way 的第 45 页），Postgres 还完全支持使用行值进行索引访问： http://stackoverflow.com/a/34291099/939860 顺便说一句，我完全同意：`date` 是大多数情况下的方法。 (4认同)

作为 Evan Carroll 提出的方法的替代方案，我认为这可能是最好的选择，我在某些情况下（不是特别在使用 PostgreSQL 时）只使用了一个year_month类型INTEGER（4 个字节）的列，计算为

 year_month = year * 100 + month

Run Code Online (Sandbox Code Playgroud)

也就是说，您在整数最右边的两个十进制数字（数字 0 和数字 1）上编码月份，并在数字 2 到 5（或更多，如果需要）上编码年份。

在某种程度上，这是构建您自己的类型和运算符的穷人的替代方法year_month。与拥有两个单独的列相比，它有一些优点，主要是“意图清晰”，并且节省了一些空间（我认为不是在 PostgreSQL 中），还有一些不便之处。

您可以通过添加一个来保证值是有效的

CHECK ((year_date % 100) BETWEEN 1 AND 12)   /*  % = modulus operator */

Run Code Online (Sandbox Code Playgroud)

你可以有一个WHERE看起来像这样的子句：

year_month BETWEEN 201610 and 201702

Run Code Online (Sandbox Code Playgroud)

并且它有效地工作（year_month当然，如果该列被正确索引）。

您可以year_month按照与日期相同的方式进行分组，并且具有相同的效率（至少）。

如果您需要将year和分开month，计算很简单：

month = year_month % 100    -- % is modulus operator
year  = year_month / 100    -- / is integer division

Run Code Online (Sandbox Code Playgroud)

有什么不方便的：如果你想在 a 上增加 15 个月，year_month你必须计算（如果我没有犯错误或疏忽）：

year_month + delta (months) = ...

    /* intermediate calculations */
    year = year_month/100 + delta/12    /* years we had + new years */
           + (year_month % 100 + delta%12) / 12  /* extra months make 1 more year? */
    month = ((year_month%10) + (delta%12) - 1) % 12 + 1

/* final result */
... = year * 100 + month

Run Code Online (Sandbox Code Playgroud)

如果你不小心，这很容易出错。

如果你想得到两个 year_months 之间的月数，你需要做一些类似的计算。这就是（经过大量简化）日期算术背后真正发生的事情，幸运的是，它通过已经定义的函数和运算符对我们隐藏了。

如果你需要很多这样的操作，使用year_month不是太实用。如果你不这样做，这是一个非常明确的方式来明确你的意图。

作为替代方案，您可以定义一个year_month类型，并定义一个运算符year_month+ interval，以及另一个year_month- year_month... 并隐藏计算。实际上，我从来没有使用过如此大量的东西，以至于在实践中感到有必要。A date-date实际上是在向您隐藏类似的东西。

作为 joanolo 方法的替代方法 =)（抱歉我很忙，但想写这个）

点点滴滴

我们将做同样的事情，但有一些位。一个int4在PostgreSQL是一个有符号整数，范围从-2147483648到+2147483647

这是我们结构的概述。

               bit                
----------------------------------
 YYYYYYYYYYYYYYYYYYYYYYYYYYYYMMMM

Run Code Online (Sandbox Code Playgroud)

储存月份。

一个月需要12个选项pow(2,4)是4位。
其余我们用于年份， 32-4 = 28 bits。

这是我们存储月份的位图。

               bit                
----------------------------------
 00000000000000000000000000001111

Run Code Online (Sandbox Code Playgroud)

月份，1 月 1 日至 12 月 12 日

               bit                
----------------------------------
 00000000000000000000000000000001
               bit                
----------------------------------
 00000000000000000000000000001100

Run Code Online (Sandbox Code Playgroud)

年。剩下的 28 位允许我们存储我们的年份信息

SELECT (pow(2,28)-1)::int;
   int4    
-----------
 268435455
(1 row)

Run Code Online (Sandbox Code Playgroud)

在这一点上，我们需要决定我们想要如何做到这一点。出于我们的目的，我们可以使用静态偏移，如果我们只需要覆盖 5,000 AD，我们可以回到268,430,455 BC它几乎涵盖整个中生代和所有有用的东西。

SELECT (pow(2,28)-1)::int4::bit(32) << 4;
               year               
----------------------------------
 11111111111111111111111111110000

Run Code Online (Sandbox Code Playgroud)

而且，现在我们有了我们这种类型的基本要素，它们将在 2700 年后到期。

所以让我们开始制作一些函数。

CREATE DOMAIN year_month AS int4;

CREATE OR REPLACE FUNCTION to_year_month (cstring text)
RETURNS year_month
AS $$
  SELECT (
    ( ((date[1]::int4 - 5000) * -1)::bit(32) << 4 )
    | date[2]::int4::bit(32)
  )::year_month
  FROM regexp_split_to_array(cstring,'-(?=\d{1,2}$)')
    AS t(date)
$$
LANGUAGE sql
IMMUTABLE;

CREATE OR REPLACE FUNCTION year_month_to_text (ym year_month)
RETURNS text
AS $$
  SELECT ((ym::bit(32) >>4)::int4 * -1 + 5000)::text ||
  '-' ||
  (ym::bit(32) <<28 >>28)::int4::text
$$ LANGUAGE sql
IMMUTABLE;

Run Code Online (Sandbox Code Playgroud)

一个快速测试表明这个工作..

SELECT year_month_to_text( to_year_month('2014-12') );
SELECT year_month_to_text( to_year_month('-5000-10') );
SELECT year_month_to_text( to_year_month('-8000-10') );
SELECT year_month_to_text( to_year_month('-84398-10') );

Run Code Online (Sandbox Code Playgroud)

现在我们有了可以在二进制类型上使用的函数。

我们可以从有符号部分再切掉一位，将年份存储为正数，然后将其自然排序为有符号整数。如果速度比存储空间更重要，那将是我们走的路线。但就目前而言，我们有一个适用于中生代的日期。

我可能会稍后更新，只是为了好玩。

归档时间：	9 年，3 月前
查看次数：	7932 次
最近记录：	9 年，3 月前

对于绝对性能，SUM 更快还是 COUNT？ 36

数据库中的用户表有哪些常见且有用的字段？ 8

PostgreSQL区间划分 7

UserID 的最佳标识符是什么？64 位整数、UUID V5 还是 64 个字符的 SHA256 UID？ 6

SELECT … ORDER BY xxx LIMIT 1 FOR UPDATE 将锁定多少行？ 6

何时仍需要使用聚集列存储索引的维度表？ 5

将两个整数存储为小数 5

SQLSTATE[08006] [7] FATAL：数据库系统处于恢复模式 5

TOAST 桌上的真空 3

提高 postgres 中 COPY COMMAND 的速度 2

MySQL有什么方法可以更快地导入一个巨大的（32 GB）sql转储？ 104

为什么 ALTER COLUMN 为 NOT NULL 会导致大量日志文件增长？ 58

什么是键/值存储数据库？ 57

数据库“所有者”的目的是什么？ 54

如何在没有外部类似 cron 的工具的情况下在 Postgresql 上运行重复任务？ 48

是否可以只安装 mongo Shell？ 42

EF Code First 对所有字符串使用 nvarchar(max)。这会损害查询性能吗？ 33

如何在postgresql中复制数据库？ 29

带有位图索引扫描的查询计划中的“重新检查条件：”行 26

EXPLAIN ANALYZE 不显示 plpgsql 函数内查询的详细信息 25