基本数据模型模式的 Django 多表继承替代方案

djv*_*jvg 12 python django inheritance data-modeling single-table-inheritance

tl;博士

在 Django 中,是否有一个简单的替代多表继承来实现下面描述的基本数据模型模式?

前提

请考虑下图中非常基本的数据模型模式,它基于例如Hay, 1996

简单地说:Organizationsand Personsare Parties,并且Parties都有Addresses 。类似的模式可能适用于许多其他情况。

这里的重点是 与Address具有显式关系Party,而不是与各个子模型Organization和具有显式关系Person

显示基本数据模型的图表

请注意,每个子模型都引入了额外的字段(此处未描述,但请参阅下面的代码示例)。

这个具体的例子有几个明显的缺点,但这不是重点。为便于讨论,假设该模式完美地描述了我们希望实现的目标,因此剩下的唯一问题是如何在 Django 中实现该模式

执行

我相信,最明显的实现将使用多表继承

class Party(models.Model):
    """ Note this is a concrete model, not an abstract one. """
    name = models.CharField(max_length=20)


class Organization(Party):
    """ 
    Note that a one-to-one relation 'party_ptr' is automatically added, 
    and this is used as the primary key (the actual table has no 'id' 
    column). The same holds for Person.
    """
    type = models.CharField(max_length=20)


class Person(Party):
    favorite_color = models.CharField(max_length=20)


class Address(models.Model):
    """ 
    Note that, because Party is a concrete model, rather than an abstract
    one, we can reference it directly in a foreign key.

    Since the Person and Organization models have one-to-one relations 
    with Party which act as primary key, we can conveniently create 
    Address objects setting either party=party_instance,
    party=organization_instance, or party=person_instance.

    """
    party = models.ForeignKey(to=Party, on_delete=models.CASCADE)
Run Code Online (Sandbox Code Playgroud)

这似乎与模式完美匹配。这几乎让我相信这就是多表继承的初衷。

然而,多表继承似乎不受欢迎,尤其是从性能的角度来看,尽管它取决于应用程序。尤其是这个来自 Django 的创建者之一的可怕但古老的帖子非常令人沮丧:

几乎在所有情况下,从长远来看,抽象继承都是更好的方法。我见过不少站点在具体继承引入的负载下崩溃,因此我强烈建议 Django 用户以大量怀疑态度对待任何具体继承的使用。

尽管有这个可怕的警告,我想那篇文章的主要观点是以下关于多表继承的观察:

这些连接往往是“隐藏的”——它们是自动创建的——这意味着看起来像简单的查询通常不是。

消歧:上面的帖子将Django的“多表继承”称为“具体继承”,在数据库层面不应与具体表继承混淆。后者实际上更符合 Django 使用抽象基类继承的概念。

我想这个问题很好地说明了“隐藏连接”问题。

备择方案

抽象继承对我来说似乎不是一个可行的替代方案,因为我们不能为抽象模型设置外键,这是有道理的,因为它没有表。我想这意味着我们需要为每个“子”模型加上一些额外的逻辑来模拟这个外键。

代理继承似乎也不是一种选择,因为每个子模型都引入了额外的字段。编辑:再想一想,如果我们在数据库级别使用单表继承,代理模型可能是一种选择,即使用包含来自,和 的所有字段的单个表。PartyOrganizationPerson

GenericForeignKey关系在某些特定情况下可能是一种选择,但对我来说它们是噩梦。

作为另一种选择,通常建议使用显式的一对一关系(这里简称eoto)而不是多表继承(因此PartyPersonOrganization都只是 的子类models.Model)。

多表继承 ( mti ) 和显式一对一关系 ( eoto )这两种方法都会产生三个数据库表。因此,当然根据查询的类型,JOIN在检索数据时通常不可避免地会出现某种形式的。

通过检查数据库中的结果表,很明显mtieoto方法之间的唯一区别,在数据库级别,是eoto Person表有一id列作为主键,和一个单独的外键列Party.id,而mti Person没有单独的id列,而是使用外键Party.id作为其主键。

问题)

我不认为示例中的行为(尤其是与父级的单一直接关系)可以通过抽象继承来实现,可以吗?如果可以,那么您将如何实现这一目标?

显式的一对一关系真的比多表继承要好得多,除了它迫使我们使查询更加明确这一事实?对我来说,多表方法的便利性和清晰度超过了明确性论点。

请注意this SO question非常相似,但并不能完全回答我的问题。此外,最新的答案现在已经有将近九年的历史了,自那以后 Django 发生了很大的变化。

[1]:Hay 1996,数据模型模式

djv*_*jvg 7

在等待更好的答案的同时,这是我对答案的尝试。

正如Kevin Christopher Henry在上面的评论中所建议的那样,从数据库方面解决问题是有意义的。由于我在数据库设计方面的经验有限,这部分我不得不依赖其他人。

如果我在任何时候错了,请纠正我。

数据模型 vs(面向对象)应用程序 vs(关系)数据库

关于对象/关系不匹配,或者更准确地说,数据模型/对象/关系不匹配可以说很多。

In the present context I guess it is important to note that a direct translation between data-model, object-oriented implementation (Django), and relational database implementation, is not always possible or even desirable. A nice three-way Venn-diagram could probably illustrate this.

Data-model level

To me, a data-model as illustrated in the original post represents an attempt to capture the essence of a real world information system. It should be sufficiently detailed and flexible to enable us to reach our goal. It does not prescribe implementation details, but may limit our options nonetheless.

In this case, the inheritance poses a challenge mostly on the database implementation level.

Relational database level

Some SO answers dealing with database implementations of (single) inheritance are:

These all more or less follow the patterns described in Martin Fowler's book Patterns of Application Architecture. Until a better answer comes along, I am inclined to trust these views. The inheritance section in chapter 3 (2011 edition) sums it up nicely:

For any inheritance structure there are basically three options. You can have one table for all the classes in the hierarchy: Single Table Inheritance (278) ...; one table for each concrete class: Concrete Table Inheritance (293) ...; or one table per class in the hierarchy: Class Table Inheritance (285) ...

and

The trade-offs are all between duplication of data structure and speed of access. ... There's no clearcut winner here. ... My first choice tends to be Single Table Inheritance ...

A summary of patterns from the book is found on martinfowler.com.

Application level

Django's object-relational mapping (ORM) API allows us to implement these three approaches, although the mapping is not strictly one-to-one.

The Django Model inheritance docs distinguish three "styles of inheritance", based on the type of model class used (concrete, abstract, proxy):

  1. abstract parent with concrete children (abstract base classes): The parent class has no database table. Instead each child class has its own database table with its own fields and duplicates of the parent fields. This sounds a lot like Concrete Table Inheritance in the database.

  2. concrete parent with concrete children (multi-table inheritance): The parent class has a database table with its own fields, and each child class has its own table with its own fields and a foreign-key (as primary-key) to the parent table. This looks like Class Table Inheritance in the database.

  3. concrete parent with proxy children (proxy models): The parent class has a database table, but the children do not. Instead, the child classes interact directly with the parent table. Now, if we add all the fields from the children (as defined in our data-model) to the parent class, this could be interpreted as an implementation of Single Table Inheritance. The proxy models provide a convenient way of dealing with the application side of the single large database table.

Conclusion

It seems to me that, for the present example, the combination of Single Table Inheritance with Django's proxy models may be a good solution that does not have the disadvantages of "hidden" joins.

Applied to the example from the original post, it would look something like this:

class Party(models.Model):
    """ All the fields from the hierarchy are on this class """
    name = models.CharField(max_length=20)
    type = models.CharField(max_length=20)
    favorite_color = models.CharField(max_length=20)


class Organization(Party):
    class Meta:
        """ A proxy has no database table (it uses the parent's table) """
        proxy = True

    def __str__(self):
        """ We can do subclass-specific stuff on the proxies """
        return '{} is a {}'.format(self.name, self.type)


class Person(Party):
    class Meta:
        proxy = True

    def __str__(self):
        return '{} likes {}'.format(self.name, self.favorite_color)


class Address(models.Model):
    """ 
    As required, we can link to Party, but we can set the field using
    either party=person_instance, party=organization_instance, 
    or party=party_instance
    """
    party = models.ForeignKey(to=Party, on_delete=models.CASCADE)
Run Code Online (Sandbox Code Playgroud)

One caveat, from the Django proxy-model documentation:

There is no way to have Django return, say, a MyPerson object whenever you query for Person objects. A queryset for Person objects will return those types of objects.

A potential workaround is presented here.