一个或多个表用于许多不同但相互作用的事件?

Dam*_*mon 4 sql events database-design

我正在创建一个应用程序,其核心功能随时间跟踪各种数据(血糖水平,胰岛素剂量,食物摄入量等),我正在尝试决定如何最好地在数据库中组织这些信息.

最基本的是这个特定伞中的所有内容都是一个事件,因此我想到了一个包含可能出现的所有属性的字段的事件表.然而,这可能是笨拙的,因为绝大多数领域最终将成为许多领域的空白; 但我不确定这是否真的是一个问题.这种方式的好处是可以更容易地调用和显示所有事件.但由于许多事件只有"时间戳"的共同点,我怀疑它们是否属于同一个表.

我不确定为每种事件都有一个表是有意义的,因为单独使用大多数事件只有一个属性而不是时​​间戳,并且它们通常必须混合在一起.(许多类型的数据通常但不总是出现在一个组中)

某些类型的事件有持续时间.有些是比较罕见的.一类事件通常是保持不变的比率,除非费率被改变为良好或临时覆盖(这些是我最担心的).有些是简单的二进制标记(我计划使用链接表,但为了简单起见,我需要/更喜欢整个event_id来链接它们.

我的倾向是,最好有一些表与密切相关的信息类型,而不是一张表,包含所有内容和大量空间......但我不太清楚如何继续.

在这样的情况下,我会喜欢一些关于确定最佳方法的策略建议.

编辑:这是我正在处理的数据类型的简要说明,以防它更清楚

events:
-blood glucose 
     timestamp
     value 
     (tagged w/: from pump, manually entered
     [pre-meal, post-meal (breakfast, lunch, dinner) before bed, fasting, hypo, high, hyper  - which will be either manually entered or inferred based on settings or other user entries], before/after exercise etc i imagine would be better off dynamically generated with queries as necessary. though could apply same paradigm to the meals?

-sensor glucose (must be separate bc it is not as reliable so will be different number from regular bg test, also unlikely to be used by majority of users.)
     timestamp
     amount

-bolus 
     (timestamp)
     bolus total
     food total
     correction total 
     active insulin**
     bolus type - normal[vast majority] square wave or dual wave

-food
     (timestamp)
     carb amount
     carb type (by weight or exchanges) <- this could probably be in user settings table
     food-description
     carb-estimated (binary) 
     meal? - or separate table.
     (accompanying bolus id? though that seems to finicky)

-meals
     timestamp
     mealname (breakfast, lunch, supper) (or mealnames table? seems excessive?)

-basal
     timestamp
     rate per hour
     rate changes throughout day on regular pattern, so either automatically fill in from 'last activated pattern' (in the form midnight: 0.7/hr, 7am: 0.9/hr, 12pm: 0.8/hr etc)
     create new pattern whenever one is used

-temp basal
     (regular basal pattern can be overridden with temporary basal)
     temp basal start
     ?temp basal end and/or temp basal duration
     temp basal amount
     temp basal type -> either in % or specific rate.

-exercise
     start-time
     end-time
     intensity
     ?description (unless 'notes' is universal for any event)

-pump rewind (every 3 days or so)
     -time

-pump prime
     -amount
     -type (fixed or manual)

-pump suspended
     start-time
     end-time

-keytones
     time
     result

-starred
     event

-flagged
     event

-notes
     timestamp
     (user can place a note with any event to provide details or comments, but might want a note where there is no data as well.)

(i want a way for users to flag specific events to indicate they are result of error or otherwise suspect, and to star events as noteworthy either to discuss with doctor or to look at later)

**only place I get active insulin from is when a bolus is entered, but it could be useful other times as a constantly tracked variable, which could be calculated by looking at boluses delivered up to X time ago where X is the Active Insulin Time.

other infrequent events (likely 2-10 per year):
-HbA1C 
     time
     value
-weight
     time
     value
     units
-cholesterol
     time
     value
-blood pressure
     time
     value

-pump settings (will need to track settings changes, but should be able to do that with queries)
     -timestamp
     -bg-target
     -active insulin time
     -carb ratios (changes throughout day like basal)
     -sensitivity
     -active insulin time
Run Code Online (Sandbox Code Playgroud)

关注.1)具有类型的总体'事件'表,可以在一段时间内快速恢复所有事件而无需查询每个表?(缺点是如何使用持续时间的事件?在事件表上有可选的结束时间?)

2)这是一个本地数据库,通常是一个用户,如果在线同步,则永远不需要比较或交互其他用户的任何记录,所以我想只保留一个版本的数据库每个用户,但可能会在上传时添加"用户"ID.

3)许多事件经常在一起,以便于解释和分析(血糖,餐,食物,推注,例如笔记),我认为最好在事后用查询而不是硬编码以保持完整性.

关于数据库将用于什么的一些信息: - 在一天中所有数据类型的可视化表示 - 平均所有测试结果和用于食物,校正,基础的胰岛素百分比. - 以及特定的高级查询,例如:列出最多20个例子,葡萄糖水平之间的葡萄糖水平之间的葡萄糖水平差异,没有食物吃,没有运动w/2小时的床,因为设置最后一次更改,等等. -program将根据参数自动分配标签.就像在指定的"午餐"期间吃了> 20个碳水化合物一样,它会说食物是午餐.如果在30分钟内有两个食物摄入量(或"膳食长度"偏好),它会将它们分组为一餐......不完全确定它现在如何起作用.

Per*_*DBA 12

V1.0

关系数据库和SQL(专为它们设计)在组织和规范化数据时表现更好.在性能和关系能力方面,一个大表是非规范化的,并且是残缺的.

您的要求需要一个普通的Supertype-Subtype表集群.不幸的是,像这样的普通关系结构并不"普遍".

  • 标准子类型符号是半圆.

    • Supertype :: Subtype的基数始终为1 :: 0到1.

    • 子类型主键是超类型主键.它也是超类型的外键.

  • 有两种类型:

    • 独占,每个Supertype行只有一个Subtype,用半圆表示X.

    • 非独占,每个Supertype行有多个子类型

  • 你的是独家.此类型需要Discriminator,以识别哪个Subtype对于Supertype行是活动的.如果子类型的数量很少,可以使用指标; 否则需要分类表.

  • 请注意,普通的IEC/ISO/ANSI SQL中提供了支持它以及提供数据完整性所需的所有这些,结构,规则,约束.(非SQL不符合SQL要求).

数据

  1. 命名非常重要.我们建议按行命名表,而不是内容或含义或操作.你说的是事件,但我只能看到读物.

  2. 这些读物或事件必须有上下文.我不知道EventId是如何挂起的.我假设读数是关于特定患者的.请指教,我会改变模型.

  3. Composite or Compound Keys are normal. SQL is quite capable (the Non-SQLs aren't). PatientId already exists as an FK in Reading, and it is used to form its PK. There is no need for an additional ReadingId column and the additional index, which would be 100% redundant.

  4. SQL is also quite capable of handling many tables (the database I am working on currently exceeds 500 tables), and large numbers of smaller tables are the nature of Relational Databases.

  5. This is pure Fifth Normal Form (no columns duplicated; no Update Anomalies).

    • This can be further Normalised to Sixth Normal Form, and thus further benefits can be gained; and the 6NF can be optimised, etc.; but all that is not required here.

    • 有些表恰好在6NF中,但这是结果,而不是意图,所以它不能被声明为这样.
      .

  6. 如果您提供有关您的限制和覆盖的信息,我可以提供解决这些问题的模型.

  7. 由于数据建模,因此已经设置了非常快速的比较(生成警报等).

读数据模型?

不熟悉关系数据库建模标准的读者可能会发现?IDEF1X Notational?有用.

随意提出澄清问题,无论是作为评论,还是作为您的问题的编辑.

警告

  1. The OO and ORM crowd (lead by Fowler and Ambler) are clueless about Relational technology and Databases. Designing Objects is quite different to modelling data. If you apply their Object design to databases, you will end up with monstrosities that need "re-factoring", and you will have to buy yet another "book" that shows you how to do that efficiently. In the meantime the "database" is crippled.

  2. Relational Databases that are modelled correctly (as data, not objects) never need "re-factoring". In highly Normalised Databases, you can add tables, columns and functions without having to change existing data or code.

  3. Even the concept of ORM is totally flawed. Data has more permanence than Objects. If you model the data first, then model your Objects for the data, it is very stable. But if you model your objects first (which is weird anyway, without an understanding of the data), then model the data after the Objects, you will be going back and forth, constantly correctly both.

  4. Relational Databases have had perfectly ordinary structures such as Supertype-Subtype for over 30 years, and they work well, if they are implemented as that. They are not "gen-spec" or "class-inheritance" or any such OO thing; and if those OO or ORM structures are implemented, without modelling the data correctly, the "database" will be crippled, and need "we-factoring".

    • 此外,它们没有实现所需的数据完整性约束,因此通常数据质量很差.我们不允许错误的数据进入数据库; 他们的"数据库"充满了糟糕的数据,他们需要另一本关于如何清洗脏数据的"书".
      .
  5. 他们将序列和层次结构混合在一起.做得正确,没有"阻抗不匹配",没有伪技术名称来掩盖纯粹的愚蠢; 证明一遍又一遍地做同样的工作是正确的.

因此,在处理关系数据库时,使用OO或ORM术语的人就像地狱一样.

V1.1

Your Edit provides far more detail, which of course is demanded, because the context, the whole, is necessary, if data is to be modelled correctly. This incorporates all that info. However, questions remain, and some back-and-forth will be required before it can be complete. Feel free to ask questions about anything that is not absolutely clear; I am not sure exactly what the gap is until I throw something up, and you speak to it.

?Event Data Model V1.1?

  1. All my models are pure Relational (retain full Relational power), IDEF1X compliant and Fifth Normal Form (no Update Anomalies). All Rules (business or data/referential Integrity) that are drawn in the model can be implemented as Declaration in ISO/IEC/ANSI SQL.

  2. Never hard-code anything. My models do not require that, and any code working with the database does not have to do that. All fixed text is Normalised into Reference or Look-up tables. (that bit is incomplete,; you need to fill in the gaps).

    • A short alphabetic code is far better than an Enumeration; once you get used to it, the values and meanings become immediately recognisable.

    • Because they are PKs, and therefore stable, you can safely code:

      ... WHERE EventTypeCode = "P"
      or
      ... WHERE EventTypeCode LIKE "T%"

  3. I believe the DataTypes are self-evident or can be worked out easily. If not, please ask.

  4. Everything that your note as "finicky" is perfectly valid. The issue is, since you have not had a database to engage with, you did not know what should be in the database vs what should be or can be SQL code. Therefore all the "finicky" items have been provided for (the database elements), you need to construct the code. Again, if there is a gap please ask.

    • What I am saying is, working in the traditional style of I am the Data Modeller, you are the Developer, you have to ensure every item from your perspective is delivered, rather than relying on me to interpret your notes. I will be delivering a database that supports all the requirements that I can glean from your notes.
      .
  5. One Patient per Database. Let's allow for the possibility that your system is successful, in the future, you will have one central workhorse database, rather than limiting it one database per patient, which would be a nightmare to administer. Let's say that you need to keep all your Patient details in one place, one version of the truth. That is what I have provided. This does not limit you in the short term, from implementing one Db per patient; there is no problem at all with only one row in the Patient table.

    • Alternately, I can strip PatientId out of al the tables, and when you grow into a central database configuration, you will require a major database upgrade.

    • Likewise, if you have Sensors or Pumps that you need to track, please identify their attributes. Any Sensor or Pump attributes would then be Normalised into those tables. If they are "one per patient" that's fine, there will be one row in those tables, unless you need to store the history of Sensors or Pumps.

  6. In V1.0 the Subtypes were Exclusive. Now they are Non-exclusive. This means we are tracking a chronology of Events, without duplication; and any single Event may consist of more than one Subtype. Eg. Notes can be inserted for any Event.

    • Before completion, the EventType list provided needs to be filed out in the form of a grid, showing (a) permitted (b) mandatory Subtypes per EventType. Thate will be implemented as CHECK Constraints in Event.
      .
  7. Naming is very important. I am using ISO standard 11179 (guidelines and principles) plus my own conventions. Reading type Events are prefixed as such. Feel free to suggest changes.

  8. Units. Traditionally, we use either Metric xor US Imperial across the database, allow entry in whatever the user likes, and convert before storage. If you need a mixture, then at least we should have the UnitType specified at the Patient or Pump level, rather than allowing storage of either UnitType. If you really need either UnitType stored, changing back and forth, then yes, we need to store UnitType with each such Value.

  9. Temporal Database. You have Times Series being recorded, and well as interpreted via SQL. Big subject, so read up on it. The minimum I would ask you to read and understand is:

    ?Temporal Database Performance (0NF vs 5NF)?

    ?Classic 5NF Temporal Database? (Inspect the Data Model carefully)

  10. Basically the issue boils down to this:

    • Either you have a true 5NF database, no data duplication, no Update Anomalies.

      • That means, for continuous time series, only the StartDateTime is recorded. The EndDtateTime is easily derived from the StartDateTime of the next row, it is not stored. Eg. Event is a continuos chronology; the EventType identifies whether the Event is a specific DateTime or a Period/Duration.

      • EndDateTime is stored only for disjoint Periods, where there are legitimate gaps between Periods; in any case it is clearly identified via the EventType. Eg. Exercise, PumpSuspended. (Incidentally, I am suggesting the patient only knows the actual, as opposed to planned, attributes, at the end of the Exercise period.)

      • Since generally there is no EndDateTime, StartDateTime is simply DateTime. Eg. EventDtm

      • This requires the use of ordinary SQL Subqueries. This is actually quite simple once the coder has a grasp on the subject. For those who don't, I have supplied a full tutorial on Subqueries in general, and using them in a Temporal context in particular, in:

      ?It Is Easy When You Know How?. Not coincidentally, re the very same Classic 5NF Temporal Database above.

    • XOR you have a database with EndDateTime stored (100% duplication) with every StartDateTime column, and you can use flat, slow queries. Lots of manipulating large result sets with GROUP BYs, instead of small result sets. Massive data duplication and Update Anomalies have been introduced, reducing the database to a flat file, to supply the needs of coders with limited ability (certainly not "ease of coding").

    • Therefore, consider carefully and choose, for the long term only, because this affects every code segment accessing temporal data. You do not want a re-write halfway down the track when you realise that maintaining Update Anomalies is worse than writing Subqueries.

      • Of course, I will provide the explicit requirements to support a 5NF Temporal Database, correct DataTypes, etc., to support all your identified requirements.

      • Further, if you choose 0NF, I will provide those fields, so that the Data Model is complete for your purpose.

      • In either case, you need to work out exactly the SQL code required for any given query.

  11. DataType handling is important. Do not store Time (hours, etc) as Integer or an offset. Store it only as TIME or DATETIME Datatype. If an offset, store it as Time since midnight. That will allow unrestricted SQL, and Date Arithmetic functions.

  12. Task for you. Go through the model carefully, and ensure that:

    • every non-key Attribute has a 1::1 relationship with its Primary Key

    • and that it does not have a relationship to any other PK (in some other table)

    And of course, check the Model and provide feedback.

Question

Given the above explanations and guidance.

  • What is ReadingBasalTemperature.Type, list values please ?

  • What is HbA1C ?

  • What is KeyTone ?

  • Do we need (ie. Duration/Period EndDateTime`):

    • ReadingBasalTemperatureEnd
    • ReadingBolusEnd
    • Basal Pattern
    • BasalTemp Pattern
    • Actually, what is a pattern, and how is it derived/compared ?
  • How is BasalTemperatureEnd (Or Duration) determined

  • Starting position is, there is no need to store Active Insulin Duration. But you need to define how the EndDateTime is determined. Based on that, if it cannot be easily derived, and or it based on too many factors or changes all the time, storing an EndDateTime might be good.

  • The Pump Settings need clarification.

V1.2

Ok, I have incorporated all information you have proved in the question and the comments. Here is a progressed Data Model.

?Event Data Model V1.2?

There are still some issues to be resolved.

  • Use a Percentage or a Rate only, not both with an additional indicator. One can be derived from the other. I am using Rate consistently.

  • ... the only worry about the approach is that for many days the basal rate will be identical.. hence redundancy

    • That is not "redundancy". That is storage of a time series of facts, which happen to be unchanging. The queries required are straight-forward.

    • However, in advanced use, yes, you can avoid storing an unchanged fact, and instead extend the duration to include the new time interval.

  • I am still not clear re your explanation of Basal Temp. Please study the new Model. First, the patterns are now stored separately. Second, we are recording a Basal Temp Start with a Rate. Do we need a Basal Temp End (with a Rate) ?

  • "GlucoseEventType would be able to have more than one value per Glucose Result" needs more definition. Don't worry about ID keys. Just tell me about the data. For each ReadingGlucoseBlood, name the result values, and which GlucoseEventType they apply to; which are mandatory and which are optional.

  • PumpHistory.InsulinEndDateTime is the ending Instant for the Duration. Of course that is generic, the starting Instant is whatever row you compare it to. Thus it should be seconds or minutes since midnight 01 Jan 1900.

  • Check the new Event PK. Where the incoming record identifies several Events, you need to parse that, and INSERT each Event-EventSubtype row, using the same DateTime.

  • Except for Patient, there are no ID keys in this database, none are required thus far. Refer to the parent by full PK.

05 Feb 11

No feedback received re V1.2.

a lot of the data i'm getting is being pulled from an external (and somewhat disorganized) csv which groups certain event types under one row and often has events on the same second, which is as granular as it gets

That is easy to overcome. However, that means that an Instant is not an Instant. Now, I could walk you through the whole exercise, but the bottom line is simple.

  • If you really need it, we could add a SequenceNo to the PK, to make it unique. But I suspect the EventTypeCode is enough (there will not be more than one EventType per second). If not, let me know, and I will change the moel.

  • Retain the meaning of an Instant as an Instant, and thus avoid departing from the architectural requirements of Temporal Databases.

  • Use EventType to afford uniqueness to the DateTime Pk.

    • Keep in mind that the EventTypeCode is deployed in the Event PK, not as a Discriminator requirement, but to afford uniqueness. Thus its presence in the PK of the Subtypes is an artefact, not that of a Discriminator (which is already known by virtue of the Subtype).
  • However there is unnecessary complexity due to the Non-exclusive Subtype (there can be more than one Subtype per Supertype row).

  • Therefore I have changed it back to an Exclusive Subtype, deterministic. One EventType per Supertype row; max one Subtype.

Refer to Implementing Referential Integrity for Subtypes for specific information re Constraints, etc.

The change to the Data Model is too small to warrant another release. I have updated the V1.2 Data Model.

06 Mar 11

Due upholding the "above all, be technically honest" guideline in the FAQ, and confronting misinformation as requested therein, I was suspended for my efforts (which means I will no longer correct misinformation on SO, and such posters have protected reign). Interaction with the seeker was carried on, to completion, and the Final Data Model was completed, away from SO. The progression is therefore lost to SO readers. However, it may be of value to post the result, the ?Final Data Model V1.16?.

  • Events always have a starting Instant (Event.DateTime).
  • Events may be Durations, in which case an ending Instant (Event) is required.
  • Some Events consist of only the Supertype; others require a Subtype. This is identified in third column of the EventType exposition.
  • The fourth column identifies the type of Event:
    • Instant or Duration
    • Duration: Conjunct or Disjunct
  • Note that the resolution of DateTime on the seeker's platform is one second, and many Events may occur in one second, but not more than one of the same EventType. EventTypeCode has therefore been included in the Event Primary Key to implement that rule. Thus it is an artefact, it is not a generic requirement for a supertype-subtype structure or for Exclusive/Non-exclusive subtypes.
  • Intended for printing on two facing US Letter pages, enlarged or not, with a gusset.