模型中前 5 个值的平均值

Blu*_*gma 8 sql django django-models amazon-redshift

我有一个包含很多字段的 django 模型。我正在尝试在单个查询中获取给定字段的平均值以及同一字段的前 5 个值的平均值(来自我关于纯 SQL 的其他问题: Average of top 5 value in a table for a给定值)通过...分组)。这并不重要,但是:我的数据库是红移的。

我找到了两种不同的方法来在 SQL 中实现此目的,但我在使用 django ORM 实现这些查询时遇到了麻烦

这是我想使用 Cars 执行的操作的示例:

class Cars(models.Model):
    manufacturer = models.CharField()
    model = models.CharField()
    price = models.FloatField()
Run Code Online (Sandbox Code Playgroud)

数据:

manufacturer | model | price
Citroen        C1      1
Citroen        C2      2
Citroen        C3      3
Citroen        C4      4
Citroen        C5      5
Citroen        C6      6
Ford           F1      7
Ford           F2      8
Ford           F3      9
Ford           F4      10
Ford           F5      11
Ford           F6      12 
Ford           F6      19 
GenMotor       G1      20
GenMotor       G3      25
GenMotor       G4      22
Run Code Online (Sandbox Code Playgroud)

预期输出:

manufacturer | average_price | average_top_5_price
Citroen        3.5             4.0
Ford           10.85           12.2
GenMotor       22.33           22.33
Run Code Online (Sandbox Code Playgroud)

下面是两个纯SQL查询,达到了预期的效果:

SELECT
    main.manufacturer,
    AVG(main.price) AS average_price,
    AVG(CASE WHEN rank <= 5 THEN main.price END) AS average_top_5_price
FROM (
    SELECT
        manufacturer,
        price,
        ROW_NUMBER() OVER (PARTITION BY manufacturer ORDER BY price DESC) AS rank
    FROM
        cars
) main
GROUP BY
    main.manufacturer;
Run Code Online (Sandbox Code Playgroud)

第二种方法:

SELECT A.manufacturer, A.avg_price, B.top5_price
FROM (
    SELECT manufacturer, AVG(price) as avg_price
    FROM cars
    GROUP BY manufacturer
) A
JOIN (
    SELECT manufacturer, AVG(psv_99) as top5_price
    FROM (
        SELECT manufacturer, price, RANK()
        OVER (PARTITION BY manufacturer ORDER BY price DESC, id)
        FROM cars
    )
    WHERE rank <= 5
    GROUP BY manufacturer
) B
ON A.manufacturer = B.manufacturer
ORDER BY manufacturer
Run Code Online (Sandbox Code Playgroud)

到目前为止,我还没有设法使用 django ORM 实现这些查询中的任何一个,对于第一个查询,我找不到让 django 为第二个查询执行“从子查询中选择”的方法,我找不到好方法强制 django “加入两个子查询”

PS:请记住,我已将表减少到三个字段以简化解决该特定问题,但我的真实表中有大约 100 列,我在相同的查询中进行不同的计算。

McP*_*son 0

values您可以使用和的组合annotategroup by制造,然后使用 计算该组的平均值Avg

average_price容易计算:

from django.db.models import Avg
from django.db.models.functions import Round

averages =
Car.objects.values("manufacturer").annotate(average_price=Round(Avg("price"), precision=2))

Run Code Online (Sandbox Code Playgroud)

但要计算每组的前五名,就有点复杂了(我认为)。为此,您需要一个Subquery. 所以,完整的代码是:


from django.db.models import Subquery, OuterRef, Avg, Q
from django.db.models.functions import Round

group_top_5 = Car.objects.filter(manufacturer=OuterRef("manufacturer")).order_by("-price")[:5].values("price")

query_filter = Q(price__in=group_top_5)
averages = (
Car.objects.values("manufacturer")
.annotate(
average_price=Round(Avg("price"), precision=2), 
average_top_5_price=Round(Avg("price", filter=query_filter), precision=2))

)
Run Code Online (Sandbox Code Playgroud)

这应该给你:

{'manufacturer': 'Citroen', 'average_price': 3.5, 'average_top_5_price': 4.0}
{'manufacturer': 'Ford', 'average_price': 10.86, 'average_top_5_price': 12.2}
{'manufacturer': 'GenMotor', 'average_price': 22.33, 'average_top_5_price': 22.33}

Run Code Online (Sandbox Code Playgroud)