Django 和 Postgres - 百分位数(中位数)和分组依据

Duš*_*ďar 4 python django postgresql statistics subquery

我需要计算每个卖家 ID 的期间中位数(请参阅下面的简单模型)。问题是我无法构建 ORM 查询。

模型

class MyModel:
    period = models.IntegerField(null=True, default=None)
    seller_ids = ArrayField(models.IntegerField(), default=list)
    aux = JSONField(default=dict)
Run Code Online (Sandbox Code Playgroud)

询问

queryset = (
    MyModel.objects.filter(period=25)
    .annotate(seller_id=Func(F("seller_ids"), function="unnest"))
    .values("seller_id")
    .annotate(
        duration=Cast(KeyTextTransform("duration", "aux"), IntegerField()),
        median=Func(
            F("duration"),
            function="percentile_cont",
            template="%(function)s(0.5) WITHIN GROUP (ORDER BY %(expressions)s)",
        ),
    )
    .values("median", "seller_id")
)
Run Code Online (Sandbox Code Playgroud)

ArrayField聚合( seller_id)源码


我认为我需要做的是以下几行

select t.*, p_25, p_75
from t join
     (select district,
             percentile_cont(0.25) within group (order by sales) as p_25,
             percentile_cont(0.75) within group (order by sales) as p_75
      from t
      group by district
     ) td
     on t.district = td.district
Run Code Online (Sandbox Code Playgroud)

上面的例子源码


Python 3.7.5、Django 2.2.8、Postgres 11.1

GCr*_*Cru 6

您可以像 Ryan Murphy ( https://gist.github.com/rdmurphy/3f73c7b1826cacee34f6c2a855b12e2e )所做的那样创建Median该类的子类。然后工作就像:AggregateMedianAvg

    from django.db.models import Aggregate, FloatField


    class Median(Aggregate):
        function = 'PERCENTILE_CONT'
        name = 'median'
        output_field = FloatField()
        template = '%(function)s(0.5) WITHIN GROUP (ORDER BY %(expressions)s)'
Run Code Online (Sandbox Code Playgroud)

然后找到一个字段的中位数使用

    my_model_aggregate = MyModel.objects.all().aggregate(Median('period'))
Run Code Online (Sandbox Code Playgroud)

然后可以作为my_model_aggregate['period__median'].