使用键/值表优于可空列或单独表的[dis]优势是什么?

Tau*_*ris 24 sql database-design schema-design

我正在升级我刚才创建的支付管理系统.它目前有一个表可以接受每种付款方式.它仅限于能够支付一件事,这个升级是为了缓解.我一直在问我应该如何设计它的建议,我有以下工作的基本想法:

  1. 每种付款类型都有一个表格,每个付款类型都有几个常用列.(目前的设计)
  2. 使用带有公共列的中央表协调所有付款(统一付款ID,无论类型如何),并标识具有专用于该付款类型的列的另一个表和行ID.
  3. 为所有付款类型设置一个表,并将未用于任何给定类型的列置空.
  4. 使用中心表的想法,但将专用列存储在键/值表中.

我的目标是:不要过于缓慢,尽可能地自我记录,并在保持其他目标的同时最大限度地提高灵活性.

由于每个表中的重复列,我不太喜欢1.它反映了继承基类的支付类型类,该基类为所有支付类型提供功能... ORM反向?

我倾向于最多2,因为它就像当前的设计一样"类型安全"和自我记录.但是,与1一样,要添加新的付款方式,我需要添加一个新表.

我不喜欢3因为它的"浪费的空间",并且不能立即清楚哪些列用于哪种支付类型.文档可以减轻这种痛苦,但我公司的内部工具没有一种有效的方法来存储/查找技术文档.

我给出的4个论点是,它可以减少在添加新的付款方式时需要更改数据库,但是由于缺乏明确性,它甚至比3更糟.目前,更改数据库不是问题,但如果我们决定开始让客户保留自己的数据库,它可能会成为一个后勤噩梦.

所以,我当然有偏见.有没有人有更好的想法?您认为哪种设计最合适?我应该根据什么标准做出决定?

Per*_*DBA 30

注意
这个主题正在讨论中,这个主题在其他主题中引用,因此我给了它一个合理的处理,请耐心等待.我的目的是提供理解,以便您可以做出明智的决定,而不仅仅是基于标签的简单决策.如果您发现它很激烈,请在闲暇时以大块的形式阅读; 你饿的时候回来,而不是之前.

什么,完全是关于EAV,是"坏"?

1简介

正确完成EAV与做得很差之间存在差异,正如3NF正确完成并且做得不好之间存在差异一样.在我们的技术工作中,我们需要准确地确定哪些有效,哪些无效; 关于什么表现良好,什么不表现.一揽子陈述是危险的,误导人,从而阻碍了有关问题的进展和普遍理解.

我不支持或反对任何事情,除非非熟练工人的执行不力,并且歪曲了对标准的遵守程度.在我看到误解的地方,就像在这里一样,我将尝试解决它.

规范化也经常被误解,所以就此而言.Wiki和其他免费资源实际上发布了完全荒谬的"定义",这些定义没有学术基础,具有供应商偏见,以便验证其不符合标准的产品.有一个Codd发表了他的十二条规则.我实现了至少5NF,这对于大多数要求来说已经足够了,所以我将其用作基线.简单地说,假设读者理解第三范式(至少该定义不混淆)......

2第五范式

2.1定义

第五范式定义为:

  • 每列只与主键有1 :: 1的关系
  • 并且没有其他列,表格或任何其他表格
  • the result is no duplicated columns, anywhere; No Update Anomalies (no need for triggers or complex code to ensure that, when a column is updated, its duplicates are updated correctly).
  • it improves performance because (a) it affects less rows and (b) improves concurrency due to reduced locking

I make the distinction that, it is not that a database is Normalised to a particular NF or not; the database is simply Normalised. It is that each table is Normalised to a particular NF: some tables may only require 1NF, others 3NF, and yet others require 5NF.

2.2 Performance

曾经有一段时间人们认为规范化没有提供性能,他们不得不"对性能进行非规范化".感谢上帝,神话已被揭穿,今天大多数IT专业人员都意识到规范化数据库的表现更好.数据库供应商优化规范化数据库,而不是针对denormllised文件系统.真实的"非规范化"是,数据库首先没有规范化(并且表现不佳),它是非标准化的,并且它们进行了一些进一步的加扰以提高性能.为了被非规范化,它必须首先忠实地归一化,并且从未发生过.我已经重写了大量这样的"非常规化性能"数据库,提供了忠实的规范化而没有其他任何东西,它们至少运行了十次,多达一百次,快点.此外,它们只需要一小部分磁盘空间.这是行人,我保证以书面形式进行锻炼.

2.3限制

5NF的局限性,或者说更全面的是:

  • 它不处理可选值,并且必须使用Null(许多设计者不允许使用Null并使用替换,但如果没有正确且一致地实现,则存在限制)
  • 您仍然需要更改DDL以添加或更改列(并且在实现之后,添加最初未标识的列的要求越来越多;更改控制很繁琐)
  • although providing the highest level of performance due to Normalisation (read: elimination of duplicates and confused relations), complex queries such as pivoting (producing a report of rows, or summaries of rows, expressed as columns) and "columnar access" as required for data warehouse operations, are difficult, and those operations only, do not perform well. Not that this is due only to the SQL skill level available, and not to the engine.

3 Sixth Normal Form

3.1 Definition

Sixth Normal Form is defined as:

  • the Relation (row) is the Primary Key plus at most one attribute (column)

It is known as the Irreducible Normal Form, the ultimate NF, because there is no further Normalisation that can be performed. Although it was discussed in academic circles in the mid nineties, it was formally declared only in 2003. For those who like denigrating the formality of the Relational Model, by confusing relations, relvars, "relationships", and the like: all that nonsense can be put to bed because formally, the above definition identifies the Irreducible Relation, sometimes called the Atomic Relation.

3.2 Progression

The increment that 6NF provides (that 5NF does not) is:

  • formal support for optional values, and thus, elimination of The Null Problem
    • a side effect is, columns can be added without DDL changes (more later)
  • effortless pivoting
  • simple and direct columnar access
    • it allows for (not in its vanilla form) an even greater level of performance in this department

Let me say that I (and others) were supplying enhanced 5NF tables 20 years ago, explicitly for pivoting, with no problem at all, and thus allowing (a) simple SQL to be used and (b) providing very high performance; it was nice to know that the academic giants of the industry had formally defined what we were doing. Overnight, my 5NF tables were renamed 6NF, without me lifting a finger. Second, we only did this where we needed it; again, it was the table, not the database, that was Normalised to 6NF.

3.3 SQL Limitation

It is a cumbersome language, particularly re joins, and doing anything moderately complex makes it very cumbersome. (It is a separate issue that most coders do not understand or use subqueries.) It supports the structures required for 5NF, but only just. For robust and stable implementations, one must implement additional standards, which may consist in part, of additional catalogue tables. The "use by" date for SQL had well and truly elapsed by the early nineties; it is totally devoid of any support for 6NF tables, and desperately in need of replacement. But that is all we have, so we need to just Deal With It.

For those of us who had been implementing standards and additional catalogue tables, it was not a serious effort to extend our catalogues to provide the capability required to support 6NF structures to standard: which columns belong to which tables, and in what order; mandatory/optional; display format; etc. Essentially a full MetaData catalogue, married to the SQL catalogue.

Note that each NF contains each previous NF within it, so 6NF contains 5NF. We did not break 5NF in order provide 6NF, we provided a progression from 5NF; and where SQL fell short we provided the catalogue. What this means is, basic constraints such as for Foreign Keys; and Value Domains which were provided via SQL Declarative Referential integrity; Datatypes; CHECKS; and RULES, at the 5NF level, remained intact, and these constraints were not subverted. The high quality and high performance of standard-compliant 5NF databases was not reduced in anyway by introducing 6NF.

3.4 Catalogue

It is important to shield the users (any report tool) and the developers, from having to deal with the jump from 5NF to 6NF (it is their job to be app coding geeks, it is my job to be the database geek). Even at 5NF, that was always a design goal for me: a properly Normalised database, with a minimal Data Directory, is in fact quite easy to use, and there was no way I was going to give that up. Keep in mind that due to normal maintenance and expansion, the 6NF structures change over time, new versions of the database are published at regular intervals. Without doubt, the SQL (already cumbersome at 5NF) required to construct a 5NF row from the 6NF tables, is even more cumbersome. Gratefully, that is completely unnecessary.

Since we already had our catalogue, which identified the full 6NF-DDL-that-SQL-does-not-provide, if you will, I wrote a small utility to read the catalogue and:

  • generate the 6NF table DDL.
  • generate 5NF VIEWS of the 6NF tables. This allowed the users to remain blissfully unaware of them, and gave them the same capability and performance as they had at 5NF
  • generate the full SQL (not a template, we have those separately) required to operate against the 6NF structures, which coders then use. They are released from the tedium and repetition which is otherwise demanded, and free to concentrate on the app logic.

I did not write an utility for Pivoting because the complexity present at 5NF is eliminated, and they are now dead simple to write, as with the 5NF-enhanced-for-pivoting. Besides, most report tools provide pivoting, so I only need to provide functions which comprise heavy churning of stats, which needs to be performed on the server before shipment to the client.

3.5 Performance

Everyone has their "disease" to suffer, their cross to bear; I happen to be obsessed with Performance. My 5NF databases performed well, so let me assure you that I ran far more benchmarks than were necessary, before placing anything in production. The 6NF database performed exactly the same as the 5NF database, no better, no worse. This is no surprise, because the only thing the 'complex" 6NF SQL does, that the 5NF SQL doesn't, is perform much more joins and subqueries.

You have to examine the myths.

  • Anyone who has benchmarked the issue (i.e examined the execution plans of queries) will know that Joins Cost Nothing, it is a compile-time resolution, they have no effect at execution time.
  • Yes, of course, the number of tables joined; the size of the tables being joined; whether indices can be used; the distribution of the keys being joined; etc, all cost something.
  • But the join itself costs nothing.
  • A query on five (larger) tables in a Unnormalised database is much slower than the equivalent query on ten (smaller) tables in the same database if it were Normalised. the point is, neither the four nor the nine Joins cost anything; they do not figure in the performance problem; the selected set on each Join does figure in it.

3.6 Benefit

  1. Unrestricted columnar access. This is where 6NF really stands out. The straight columnar access was so fast that there was no need to export the data to a data warehouse in order to obtain speed from specialised DW structures.

    My research into a few DWs, by no means complete, shows that they consistently store data by columns, as opposed to rows, which is exactly what 6NF does. I am conservative, so I am not about to make any declarations that 6NF will displace DWs, but in my case it eliminated the need for one.

  2. It would not be fair to compare functions available in 6NF that were unavailable in 5NF (eg. Pivoting), which obviously ran much faster.

That was our first true 6NF database (with a full catalogue, etc; as opposed to the always 5NF with enhancements only as necessary; which later turned out to be 6NF), and the customer is very happy. Of course I was monitoring performance for some time after delivery, and I identified an even faster columnar access method for my next 6NF project. That, when I do it, might present a bit of competition for the DW market. The customer is not ready, and we do not fix that which is not broken.

3.7 What, Exactly, about 6NF, is "Bad" ?

Note that not everyone would approach the job with as much formality, structure, and adherence to standards. So it would be silly to conclude from our project, that all 6NF databases perform well, and are easy to maintain. It would be just as silly to conclude (from looking at the implementations of others) that all 6NF databases perform badly, are hard to maintain; disasters. As always, with any technical endeavour, the resulting performance and ease of maintenace are strictly dependent on formality, structure, and adherence to standards, in addition to the relevant skill set.

3.8 Availablility

Please don't expose yourself and ask for anything beyond the boundaries of standard commercial practice, such as "published references", the customer is an Australian bank, the whole implementation is confidential; but I am free to take prospects on visits. You are also welcome to view (but not copy) the documentation at our offices in Sydney. The methodology (structures and standards beyond the publicly available 6NF education) and the utilities, is our proprietary Intellectual Property, and it is available for assignments. At this stage I am selling it only as part of an assignment, because (a) I need to reasonably ensure success of the project (in order not to hurt our reputation), and (b) one successful project under our belts is not enough maturity to classify it as 'ready for market'.

I am happy to continue answering questions, and providing helpful information re the 6NF catalogue, advice re what works and what doesn't, etc, without actually publishing our IP (documentation). I am also happy to run qualified benchmarks for you.

4 Entity Attribute Value

Disclosure: Experience. I have inspected a few of these, mostly hospital and medical systems. I have performed corrective assignments on two of them. The initial delivery by the overseas provider was quite adequate, although not great, but the extensions implemented by the local provider were a mess. But not nearly the disaster that people have posted about re EAV on this site. A few months intense work fixed them up nicely.

4.1 What It Is

It was obvious to me that the EAV implementations I have worked on are merely subsets of Sixth Normal Form. Those who implement EAV do so because they want some of the features of 6NF (eg. ability to add columns without DDL changes), but they do not have the academic knowledge to implement true 6NF, or the standards and structures to implement and administer it securely. Even the original provider did not know about 6NF, or that EAV was a subset of 6NF, but they readily agreed when I pointed it out to them. Because the structures required to provide EAV, and indeed 6NF, efficiently and effectively (catalogue; Views; automated code generation) are not formally identified in the EAV community, and are missing from most implementations, I classify EAV as the bastard son Sixth Normal Form.

4.2 What, Exactly, about EAV, is "Bad" ?

Going by the comments in this and other threads, yes, EAV done badly is a disaster. More important (a) they are so bad that the performance provided at 5NF (forget 6NF) is lost and (b) the ordinary isolation from the complexity has not been implemented (coders and users are "forced" to use cumbersome navigation). And if they did not implement a catalogue, all sorts of preventable errors will not have been prevented. All that may well be true for bad (EAV or other) implementations, but it has nthing to do with 6NF or EAV. The two projects I worked had quite adequate performance (sure, it could be improved; but there was no bad performance due to EAV), and good isolation of complexity. Of course, they were nowhere near the quality or performance of my 5NF databases or my true 6NF database, but they were fair enough, given the level of understanding of the posted issues within the EAV community. They were not the disasters and sub-standard nonsense alleged to be EAV in these pages.

5 Nulls

There is a well-known and documented issue called The Null Problem. It is worthy of an essay by itself. For this post, suffice to say:

  • the problem is really the optional or missing value; here the consideration is table design such that there are no Nulls vs Nullable columns
  • actually it does not matter because, regardless of whether you use Nulls/No Nulls/6NF to exclude missing values, you will have to code for that, the problem precisely then, is handling missing values, which cannot be circumvented
    • except of course for pure 6NF, which eliminates the Null Problem
    • the coding to handle missing values remains
      • except, with automated generation of SQL code, heh heh
  • Nulls are bad news for peformance, and many of us have decided decades ago not to allow Nulls in the database (Nulls in passed paramaters and result sets, to indicate missing values, is fine)
    • which means a set of Null Substitutes and boolean columns to indicate missing values
  • Nulls cause otherwise fixed len columns to be variable len; variable len columns should never be used in indices, because a little 'unpacking' has to be performed on every access of every index entry, during traversal or dive.

6 Position

I am not a proponent of EAV or 6NF, I am a proponent of quality and standards. My position is:

  1. Always, in all ways, do whatever you are doing to the highest standard that you are aware of.

  2. Normalising to Third Normal Form is minimal for a Relational Database (5NF for me). DataTypes, Declarative referential Integrity, Transactions, Normalisation are all essential requirements of a database; if they are missing, it is not a database.

    • if you have to "denormalise for performance", you have made serious Normalisation errors, your design in not normalised. Period. Do not "denormalise", on the contrary, learn Normalisation and Normalise.
  3. There is no need to do extra work. If your requirement can be fulfilled with 5NF, do not implement more. If you need Optional Values or ability to add columns without DDL changes or the complete elimination of the Null Problem, implement 6NF, only in those tables that need them.

  4. If you do that, due only to the fact that SQL does not provide proper support for 6NF, you will need to implement:

    • a simple and effective catalogue (column mix-ups and data integrity loss are simply not acceptable)
    • 5NF access for the 6NF tables, via VIEWS, to isolate the users (and developers) from the encumbered (not "complex") SQL
    • write or buy utilities, so that you can generate the cumbersome SQL to construct the 5NF rows from the 6NF tables, and avoid writing same
    • measure, monitor, diagnose, and improve. If you have a performance problem, you have made either (a) a Normalisation error or (b) a coding error. Period. Back up a few steps and fix it.
  5. If you decide to go with EAV, recognise it for what it is, 6NF, and implement it properly, as above. If you do, you will have a successful project, guaranteed. If you do not, you will have a dog's breakfast, guaranteed.

6.1 There Ain't No Such Thing As A Free Lunch

That adage has been referred to, but actually it has been misused. The way it actually, deeply applies is as above: if you want the benefits of 6NF/EAV, you had better be willing too do the work required to obtain it (catalogue, standards). Of course, the corollary is, if you don't do the work, you won't get the benefit. There is no "loss" of Datatypes; value Domains; Foreign keys; Checks; Rules. Regarding performance, there is no performance penalty for 6NF/EAV, but there is always a substantial performance penalty for slip-shod, sub-standard work.

7 Specific Question

Finally. With due consideration to the context above, and that it is a small project with a small team, there is no question:

  • Do not use EAV (or 6NF for that matter)
  • Do not use Nulls or Nullable columns (unless you wish to subvert performance)
  • Do use a single Payment table for the common payment columns
  • and a child table for each PaymentType, each with its specific columns
  • All fully typecast and constrained.

  • What's this "another row_id" business ? Why do some of you stick an ID on everything that moves, without checking if it is a deer or an eagle ? No. The child is a dependent child. The Relation is 1::1. The PK of the child is the PK of the parent, the common Payment table. This is an ordinary Supertype-Subtype cluster, the Differentiator is PaymentTypeCode. Subtypes and supertypes are an ordinary part of the Relational Model, and fully catered for in the database, as well as in any good modelling tool.

    Sure, people who have no knowledge of Relational databases think they invented it 30 years later, and give it funny new names. Or worse, they knowingly re-label it and claim it as their own. Until some poor sod, with a bit of education and professional pride, exposes the ignorance or the fraud. I do not know

    • -1 @开始说"一揽子陈述是危险的,误导人......"然后进行一揽子陈述,并没有提供证据支持索赔.在没有链接的情况下以不寻常的方式定义5NF和6NF,因此不清楚定义来自何处.没有单一链接的非常长的文章.此外,其中一些工作听起来像来自CJ Date,但没有归属. (7认同)
    • @Eamon:连接成本是一个神话,或者它们在大桌子上很贵;我们需要准确地研究成本是多少。我从许多基准测试中陈述了一般事实,而不是我的特殊情况。连接在编译时解决,在执行时没有成本。实际上你是在证明我在另一篇文章中详细说明的重新加入成本,成本是:数据集大小;指数或缺乏;数据类型和不匹配;SARG/范围;列的基数/分布......查询可能很慢,但**连接本身**,或者大表或小表之间的许多连接,**无需任何费用。** (2认同)
    • 如果您的查询中有多个表,因此需要在多个表中进行查找,那么"加入"在*exec*时非常昂贵.当然*关键字*并不慢; 您可以进行隐式交叉连接或使用子查询,并且速度很慢.考虑到设计关系模型并在规范化(以及在多个表上拆分实体)和不规范化之间做出选择的背景可以产生差异.在*this*context中 - 是的,*joins*很慢,意味着不一定是查询计划(*也可能很慢*),而是跨越多个表的查询的执行时间. (2认同)
    • @Eamon:你错过了我的观点,请再读一遍.对关系数据的查询通常会涉及更多表,是的.加入任何费用.表大小成本:连接三个胖表只需加入两个胖表,成本在表大小(等,如上).通过各种方式,由于您的搜索集大小限制自己; 但不是由于连接的数量.此外,你显然错过了非标准化表格更大的事实; 在更大的搜索集上运行; 并且有比标准化表更多的索引,所以你要自己打架. (2认同)
    • @Eamon:1. *你*,而不是我,将实体拆分到多个表上,无论出于何种原因,但**它是非规范化的,而不是关系**。如果您是认真的,请使用 DDL 发布一个新问题,以便我们解决它,但请停止这种不成熟的废话。2.我有足够的经验来坚持我的技术原则;显然,您习惯于弯曲和半执行它们(这会导致系统脆弱),并被那些不这样做的人所困扰,并且您必须通过攻击原则的无懈可击来不断捍卫自己的脆弱地位。很伤心。 (2认同)
    • @PerfomranceDBA啊,但你没有对<b> false </ b>一揽子陈述.你只是说一揽子陈述.例如,您写道,您看到10到100次性能提升.但你不支持细节.你做了哪些改变.什么样的环境.是mysql,sybase等等.有多少条记录.读写的增加是统一的吗?内存和CPU利用率如何?以下是声明包含详细信息http://blog.stackoverflow.com/2010/10/database-upgrade/的示例.另请参阅已发布的原则.(书也很好) (2认同)
    • “连接在编译时解决,执行时没有任何成本。” 这种说法绝对是无稽之谈。如果您相信这一点,您就永远不会查看实际的执行计划。 (2认同)

Con*_*rix 12

也许你应该看看这个问题

Bill Karwin接受的答案是针对通常称为实体属性值(EVA)的键/值表的特定参数

..虽然很多人似乎赞成EAV,但我没有.它似乎是最灵活的解决方案,因此也是最好的解决方案.但是,请记住格言 TANSTAAFL.以下是EAV的一些缺点:

  • 无法强制列(相当于NOT NULL).
  • 无法使用SQL数据类型来验证条目.
  • 无法确保属性名称拼写一致.
  • 无法将外键放在任何给定属性的值上,例如查找表.
  • 在传统的表格布局中获取结果既复杂又昂贵,因为要从JOIN每个属性中获取多行的 属性.

EAV给你的灵活程度需要在其他方面做出牺牲,可能会使你的代码变得复杂(或更糟),而不是以更传统的方式解决原始问题.

在大多数情况下,没有必要具备这种程度的灵活性.在OP关于产品类型的问题中,为产品特定属性创建每个产品类型的表格要简单得多,因此您至少对同一产品类型的条目强制执行一些一致的结构.

只有在必须允许每一行可能具有一组不同的属性时才使用EAV .如果您拥有一组有限的产品类型,EAV就会过度.类表继承将是我的第一选择.

  • 5"无法"的标准是完全不正确的:它们对于实施不良的EAV可能是正确的,但它们并不能反映出实施的EAV.在SQL中,在EAV中实现所有这些都没有问题.事实上,6NF(EAV是6NF的一个子集)的性能比3NF或5NF更快,所以再一次,声明不是关于EAV.SQL肯定更复杂,但是这可以通过metadat轻松克服,并自动化SQL生成.表格布局很容易通过VIEWS提供.同意它不适合每个人,并且做得很糟糕,它很臭. (2认同)