具有变更历史的数据库设计

Del*_*ang 22 sql postgresql change-tracking

我期待设计一个数据库,跟踪每一组变化,以便我将来可以参考它们.例如:

Database A 

+==========+========+==========+
|   ID     |  Name  | Property |

     1        Kyle      30
Run Code Online (Sandbox Code Playgroud)

如果我将行的'property'字段更改为50,它应该将行更新为:

1    Kyle    50
Run Code Online (Sandbox Code Playgroud)

但是应该保存行的属性在某个时间点为30的事实.然后,如果该行再次更新为70:

1    Kyle    70
Run Code Online (Sandbox Code Playgroud)

应该保留行的属性为50和70的两个事实,这样我可以检索一些查询:

1    Kyle    30
1    Kyle    50
Run Code Online (Sandbox Code Playgroud)

它应该认识到这些是在不同时间点的"相同条目".

编辑:此历史记录需要在某个时间点呈现给用户,因此理想情况下,应该了解哪些行属于同一"修订群集"

处理此数据库设计的最佳方法是什么?

Cha*_*ana 20

一种方法是为MyTableNameHistory数据库中的每个表创建一个,并使其模式与表的模式相同MyTableName,只是History表的主键有一个名为effectiveUtcDateTime的附加列.例如,如果您有一个名为的表Employee,

Create Table Employee
{
  employeeId integer Primary Key Not Null,
  firstName varChar(20) null,
  lastName varChar(30) Not null,
  HireDate smallDateTime null,
  DepartmentId integer null
}
Run Code Online (Sandbox Code Playgroud)

那么历史表就是

Create Table EmployeeHistory
{
  employeeId integer Not Null,
  effectiveUtc DateTime Not Null,
  firstName varChar(20) null,
  lastName varChar(30) Not null,
  HireDate smallDateTime null,
  DepartmentId integer null,
  Primary Key (employeeId , effectiveUtc)
}
Run Code Online (Sandbox Code Playgroud)

然后,您可以在Employee表上放置一个触发器,这样每次在Employee表中插入,更新或删除任何内容时,都会在EmployeeHistory表中插入一条新记录,其中包含所有常规字段的完全相同的值,并且当前effectiveUtc列中的UTC日期时间.

然后,要在过去的任何点找到值,只需从历史表中选择其有效Utc值是您希望该值为asOf日期时间之前的最高值的记录.

 Select * from EmployeeHistory h
 Where EmployeeId = @EmployeeId
   And effectiveUtc =
    (Select Max(effectiveUtc)
     From EmployeeHistory 
     Where EmployeeId = h.EmployeeId
        And effcetiveUtc < @AsOfUtcDate) 
Run Code Online (Sandbox Code Playgroud)


Den*_*rdy 6

最好的方法取决于你在做什么。您想要更深入地了解缓慢变化的维度:

https://en.wikipedia.org/wiki/Slowly_changing_dimension

在 Postgres 9.2 中也不要错过 tsrange 类型。它允许将start_date和合并end_date到单个列中,并使用 GIST(或 GIN)索引以及排除约束来索引内容,以避免重叠日期范围。


编辑:

应该了解哪些行属于同一“修订簇”

在这种情况下,您希望表格中以某种方式显示日期范围,而不是修订号或实时标志,否则您最终会在各处复制相关数据。

另外,请考虑将审计表与实时数据区分开来,而不是将所有内容都存储在同一个表中。它更难实施和管理,但它可以更有效地查询实时数据。


另请参阅此相关文章:时态数据库设计,有一些变化(实时行与草稿行)


Luk*_*uke 5

要添加到Charles的答案中,我将使用Entity-Attribute-Value模型,而不是为数据库中的每个其他表创建一个不同的历史记录表。

基本上,您将像这样创建一个 History表:

Create Table History
{
  tableId varChar(64) Not Null,
  recordId varChar(64) Not Null,
  changedAttribute varChar(64) Not Null,
  newValue varChar(64) Not Null,
  effectiveUtc DateTime Not Null,
  Primary Key (tableId , recordId , changedAttribute, effectiveUtc)
}
Run Code Online (Sandbox Code Playgroud)

然后,您可以在History任何一个表中创建修改数据时创建一条记录。

To follow your example, when you add 'Kyle' to your Employee table, you would create two records (one for each non-id attribute), and then you would create a new record every time a property changes:

History 
+==========+==========+==================+==========+==============+
| tableId  | recordId | changedAttribute | newValue | effectiveUtc |
| Employee | 1        | Name             | Kyle     | N            |
| Employee | 1        | Property         | 30       | N            |
| Employee | 1        | Property         | 50       | N+1          |
| Employee | 1        | Property         | 70       | N+2          |
Run Code Online (Sandbox Code Playgroud)

Alternatively, as a_horse_with_no_name suggested, if you don't want to store a new History record for every field change, you can store grouped changes (such as changing Name to 'Kyle' and Property to 30 in the same update) as a single record. In this case, you would need to express the collection of changes in JSON or some other blob format. This would merge the changedAttribute and newValue fields into one (changedValues). For example:

History 
+==========+==========+================================+==============+
| tableId  | recordId | changedValues                  | effectiveUtc |
| Employee | 1        | { Name: 'Kyle', Property: 30 } | N            |
Run Code Online (Sandbox Code Playgroud)

This is perhaps more difficult than creating a History table for every other table in your database, but it has multiple benefits:

  • adding new fields to tables in your database won't require adding the same fields to another table
  • fewer tables used
  • It's easier to correlate updates to different tables over time

One architectural benefit of this design is that you are decoupling the concerns of your app and your history/audit capabilities. This design would work just as well as a microservice using a relational or even NoSQL database that is separate from your application database.

  • 将所有行值存储在单个JSON或hstore列中,而不是为每个修改后的列存储一行,可能更有效。例如,遵循各种审核触发器中使用的模式,请参阅:http://okbob.blogspot.de/2015/01/most-simply-implementation-of-history.html或http://8kb.co.uk/blog/2015 / 01/19 / copying-pavel-stehules-simple-history-table-but-with-the-jsonb-type /或http://cjauvin.blogspot.de/2013/05/impossfully-lean-audit-system- for.html (4认同)
  • @Marecky 您可以使用旧值或新值。我喜欢使用新值并使用包含初始化时当前值的种子记录来初始化表。您也可以在发生更改时只记录旧值,并依赖实际表的当前数据。 (2认同)