Should I split timestamp parts into separate columns?

GRo*_*tar 2 postgresql performance database-design optimization timestamp query-performance

I am building a PostgreSQL database and I have created a timestamp table, where the primary key is the timestamp itself (e.g. id: Fri Apr 13 2018 15:00:19). The database is supposed to be later migrated to a data warehouse, from which analytics will be extracted.

At this point, I am wondering whether it is beneficial to add extra columns to the timestamp table, containing the parsed metrics such as the example below, or have a single table with the ID's.

id                       | year | month | day | hour | minutes | seconds
-------------------------------------------------------------------------
Fri Apr 13 2018 15:00:19 | 2018 |   4   | 13  |  15  |    0    |   19


vs


id
-------------------------
Fri Apr 13 2018 15:00:19
Run Code Online (Sandbox Code Playgroud)

My goal is to achieve the best performance possible when querying the data warehouse, so I'm assuming having the timestamp split accordingly will result in faster queries rather than unzipping time metrics in real-time:

SELECT * FROM timestamp_table WHERE year = 2018 /* Querying values already parsed */

vs

SELECT * FROM timestamp_table WHERE YEAR(timestamp_id) = 2018 /* Parsing in real-time*/
Run Code Online (Sandbox Code Playgroud)

I would appreciate some best practices input on this.

Lau*_*lbe 6

Keep the timestamp and don't add columns for the parts.

If you need to search for part of a timestamp, you can always create indexes on extract expressions.

拥有单独的列会浪费空间并增加不必要的冗余,而我无法想象任何好处。

  • 通常你不应该冗余地存储相同的信息。如果您获得显着的性能优势,则例外是可以的。如果是在多个表中存储相同时间戳的情况,则取决于您的数据模型和查询。但是我很确定您不会通过分别存储时间戳的几个部分来获得性能优势。 (2认同)

mus*_*cio 5

您似乎在进行过早的优化——您不应该假设任何特定设计的性能特征,而是对其进行测试。

当您将时间戳值的组件存储在单独的列中时,您可能不会获得明显的性能优势,但增加数据不一致或维护开销(或两者)的风险。

话虽如此,可能有正当理由将时间戳的某些组件存储为单独的列,例如:

  • 组件(例如年、季度、月)构成数据仓库模型中的有效维度。
  • 您的数据库物理设计要求按时间间隔进行数据分区,以方便维护或提高某些操作的性能。