Oracle - 部分可为空的复合外键

The*_*ler 5 sql oracle foreign-keys

我有一个旧的 Oracle 数据库,它有一个我想要理解的奇怪的怪癖。它有一个复合外键,其中一些列可以为空。对我来说,这听起来像是一个粗心的开发人员的糟糕设计,但我想征求意见。当然,原来的开发团队早已不复存在。

该表在列方面要大得多,但我认为我能够在下面的示例中提炼问题:

create table quadrant (
  region number(9) not null,
  area number(9) not null,
  caption varchar2(20),
  primary key (region, area)
);

insert into quadrant (region, area, caption) values (10, 123, 'Chicago');
insert into quadrant (region, area, caption) values (10, 125, 'Wisconsin');

create table farm (
  id number(9),
  region_id number(9) not null,
  area_id number(9),
  name varchar2(50),
  constraint fk_region_area foreign key (region_id, area_id)
    references quadrant (region, area)
);

insert into farm (id, region_id, area_id, name) values (5, 10, null, 'farm 1');
insert into farm (id, region_id, area_id, name) values (6, 11, null, 'farm 2');

select * from farm;
Run Code Online (Sandbox Code Playgroud)

结果:

ID  REGION_ID  AREA_ID  NAME
--  ---------  -------  ------
5   10         <null>   farm 1  <-- Does it point to anything?
6   11         <null>   farm 2  <-- Region 11 doesn't even exist!
Run Code Online (Sandbox Code Playgroud)

如果外键的一部分为空,那么它有什么意义?

  • REGION_ID, AREA_ID = (10, null)指向任何东西,还是只是无用的信息?
  • REGION_ID, AREA_ID = (11, null)指向什么?我不这么认为。

我很想添加一个约束来强制 ALL 或 NONE 就外键的空值而言。是否有意义?

但最重要的是,这个“功能”的用例是什么?

The*_*ler 8

Thanks to all answers and comments. This issue I have forced me to learn something new, and that's a good thing. @philipxy gave me the big clue. I wanted to recap what I learned since it will probably be of use to someone else, and this is a good place to record it.

This question has two sides: First, what a partially null foreign key means, and second how it's implemented.

Meaning of a Partially Null Foreign Keys

There's a lot of debate of what does this mean -- as @agiles231 points out. NULL could mean:

  • a value is unknown.
  • others say it means the value is invalid.
  • others say NULL is a bona fide value per se.

In short, there's no clear answer to its meaning as of yet.

I guess depending on how people interpret null values, then the strategy to use them in foreign keys (and to validate them) could be different.

Implementation of Partially Null Foreign Keys

The SQL-92 Standard defines (section 4.10.2) three different ways of matching composite foreign keys with nullable values:

  • Match SIMPLE: If any column of the composite foreign key is null, then the foreign key is accepted, stored, but not validated against the referenced table. This is usually the default mode databases offer. In the SQL-92 standard this mode is described but not named.

  • Match PARTIAL: If any column of the composite foreign key is null, then each non-null column is matched against the referenced table to check there is at least one row where the value is present. I've seen no database implementing this mode.

  • Match FULL: Partially null foreign keys are not accepted. Either the foreign key is fully null or fully not null. When null, there's no validation against the referenced table. When not null, it's fully validated against the referenced table. This is what I was expecting as a default behavior (in my blissful ignorance).

Well, I checked how 10 different databases implemented these modes and here's what I found:

Database Engine  Match SIMPLE  Match PARTIAL  Match FULL
---------------  ------------  -------------  ----------
Oracle 12c1      YES*1         NO             NO
DB2 10.5         YES*1         NO             NO
PostgreSQL 10    YES*1         NO             YES
SQL Server 2014  YES*1         NO             NO
MariaDB 10.3     YES*1         NO*2           NO*2
MySQL 8.0        YES*1         NO*2           NO*2
Sybase ASE 16    YES*1         NO             YES
H2 1.4           YES*1         NO             NO
Derby 10.13      YES*1         NO             NO
HyperSQL 2.3     YES*1         NO             YES
Run Code Online (Sandbox Code Playgroud)

*1 This is the default mode.

*2 Accepted when creating a table, but ignored.

In short:

  • All tested databases behave the same way by default: they default to Match SIMPLE.

  • No database I tested supports Match PARTIAL. I guess it makes sense since I personally find little use for it. Moreover, it could become prohibitively expensive to perform partial validation on separate foreign key columns without creating all possible index combinations on the referenced table.

  • PostgreSQL implements Match FULL as well as Sybase ASE. That's great news! Surprisingly HyperSQL (this tiny database) does too.

Workaround to implement Match FULL

The good news is there's an rather easy workaround to implement Match FULL if you happen to need it, in any of the tested databases. Just add a table constraint that allows either all null columns, or all non-null. Something like:

create table farm (
  id int,
  region_id int,
  area_id int,
  name varchar(50),
  constraint fk_region_area foreign key (region_id, area_id)
    references quadrant (region, area),
  constraint fkfull_region_area check ( -- here's the workaround
    region_id is null and area_id is null or
    region_id is not null and area_id is not null)
);

insert into farm (id, region_id, area_id, name) values (5, 10, null, 'farm 1'); -- fails

insert into farm (id, region_id, area_id, name) values (6, 11, null, 'farm 2'); -- fails

insert into farm (id, region_id, area_id, name) values (7, 10, 125, 'farm 3'); -- succeeds

insert into farm (id, region_id, area_id, name) values (8, null, null, 'farm 4'); -- succeeds
Run Code Online (Sandbox Code Playgroud)

It works pretty neat.

Finally, and as a very personal opinion, I would have expected Match FULL to be the default matching strategy. Maybe it's just that to me allowing (by default) foreign keys that do not point to other rows encourages errors in the applications that use the database.

I think that most developers will understand FULL easily, compared to SIMPLE. And PARTIAL is way more complicated, and potentially error prone. Just my opinion.


小智 1

关于使用 null 来表示某些内容存在很多争论。有些人认为 null 意味着该值未知或代表无效,其他人则认为它本身就是一个实际值。我怀疑在这种情况下,它代表着未知。假设您正在记录 100 年前某个县农场的位置。使用一些当地历史书籍,您已经绘制了该时期 70% 的现有农场及其确切边界(或附近),但对于剩余的 30%,有些有已知的区域,有些只是已知存在。在这种情况下,我肯定会说空外键是有意义的。这根本就是未知的信息。