GRo*_*tar 2 postgresql performance database-design optimization timestamp query-performance
I am building a PostgreSQL database and I have created a timestamp
table, where the primary key is the timestamp itself (e.g. id: Fri Apr 13 2018 15:00:19
). The database is supposed to be later migrated to a data warehouse, from which analytics will be extracted.
At this point, I am wondering whether it is beneficial to add extra columns to the timestamp
table, containing the parsed metrics such as the example below, or have a single table with the ID's.
id | year | month | day | hour | minutes | seconds
-------------------------------------------------------------------------
Fri Apr 13 2018 15:00:19 | 2018 | 4 | 13 | 15 | 0 | 19
vs
id
-------------------------
Fri Apr 13 2018 15:00:19
Run Code Online (Sandbox Code Playgroud)
My goal is to achieve the best performance possible when querying the data warehouse, so I'm assuming having the timestamp split accordingly will result in faster queries rather than unzipping time metrics in real-time:
SELECT * FROM timestamp_table WHERE year = 2018 /* Querying values already parsed */
vs
SELECT * FROM timestamp_table WHERE YEAR(timestamp_id) = 2018 /* Parsing in real-time*/
Run Code Online (Sandbox Code Playgroud)
I would appreciate some best practices input on this.
Keep the timestamp and don't add columns for the parts.
If you need to search for part of a timestamp, you can always create indexes on extract
expressions.
拥有单独的列会浪费空间并增加不必要的冗余,而我无法想象任何好处。
您似乎在进行过早的优化——您不应该假设任何特定设计的性能特征,而是对其进行测试。
当您将时间戳值的组件存储在单独的列中时,您可能不会获得明显的性能优势,但会增加数据不一致或维护开销(或两者)的风险。
话虽如此,可能有正当理由将时间戳的某些组件存储为单独的列,例如: