在Django中实现流行度算法

Question

在Django中实现流行度算法

The*_*ing 7 python django algorithm postgresql

我正在创建一个类似于reddit和黑客新闻的网站,其中包含链接和投票数据库.我正在实施黑客新闻的流行算法,事情正在顺利进行,直到实际收集这些链接并显示它们.算法很简单:

Y Combinator's Hacker News:
Popularity = (p - 1) / (t + 2)^1.5`

Votes divided by age factor.
Where`

p : votes (points) from users.
t : time since submission in hours.

p is subtracted by 1 to negate submitter's vote.
Age factor is (time since submission in hours plus two) to the power of 1.5.factor is (time since submission in hours plus two) to the power of 1.5.

我在Django中询问了一个非常类似的关于yonder Complex排序的问题,但是我没有考虑我的选择,而是选择了一个并试图让它工作,因为我是用PHP/MySQL做的,但我现在知道Django做的事情有很多不同.

我的模型看起来像这样(完全)

class Link(models.Model):
category = models.ForeignKey(Category)
user = models.ForeignKey(User)
created = models.DateTimeField(auto_now_add = True)
modified = models.DateTimeField(auto_now = True)
fame = models.PositiveIntegerField(default = 1)
title = models.CharField(max_length = 256)
url = models.URLField(max_length = 2048)

def __unicode__(self):
    return self.title

class Vote(models.Model):
link = models.ForeignKey(Link)
user = models.ForeignKey(User)
created = models.DateTimeField(auto_now_add = True)
modified = models.DateTimeField(auto_now = True)
karma_delta = models.SmallIntegerField()

def __unicode__(self):
    return str(self.karma_delta)

Run Code Online (Sandbox Code Playgroud)

和我的观点:

def index(request):
popular_links = Link.objects.select_related().annotate(karma_total = Sum('vote__karma_delta'))
return render_to_response('links/index.html', {'links': popular_links})

Run Code Online (Sandbox Code Playgroud)

现在从我之前的问题,我正在尝试使用排序功能实现该算法.这个问题的答案似乎认为我应该把算法放在select和sort中.我打算对这些结果进行分页,所以我不认为我可以在没有抓取所有内容的情况下在python中进行排序.关于如何有效地做到这一点的任何建议？

编辑

这还没有成功,但我认为这是朝着正确方向迈出的一步:

from django.shortcuts import render_to_response
from linkett.apps.links.models import *

def index(request):
popular_links = Link.objects.select_related()
popular_links = popular_links.extra(
    select = {
        'karma_total': 'SUM(vote.karma_delta)',
        'popularity': '(karma_total - 1) / POW(2, 1.5)',
    },
    order_by = ['-popularity']
)
return render_to_response('links/index.html', {'links': popular_links})

Run Code Online (Sandbox Code Playgroud)

这错误到:

Caught an exception while rendering: column "karma_total" does not exist
LINE 1: SELECT ((karma_total - 1) / POW(2, 1.5)) AS "popularity", (S...

Run Code Online (Sandbox Code Playgroud)

编辑2

更好的错误？

TemplateSyntaxError: Caught an exception while rendering: missing FROM-clause entry for table "vote"
LINE 1: SELECT ((vote.karma_total - 1) / POW(2, 1.5)) AS "popularity...

Run Code Online (Sandbox Code Playgroud)

我的index.html很简单:

{% block content %}

{% for link in links %}
 
  
   karma-up
   {{ link.karma_total }}
   karma-down
  
  {{ link.title }}
  Posted by {{ link.user }} to {{ link.category }} at {{ link.created }}
 
{% empty %}
 No Links
{% endfor %}

{% endblock content %}

编辑3 非常接近!同样,所有这些答案都很棒,但我专注于一个特定的答案,因为我认为它最适合我的情况.

from django.db.models import Sum
from django.shortcuts import render_to_response
from linkett.apps.links.models import *

def index(request):
    popular_links = Link.objects.select_related().extra(
        select = {
            'popularity': '(SUM(links_vote.karma_delta) - 1) / POW(2, 1.5)',
        },
        tables = ['links_link', 'links_vote'],
        order_by = ['-popularity'],
    )
    return render_to_response('links/test.html', {'links': popular_links})

Run Code Online (Sandbox Code Playgroud)

运行这个我有一个错误,因为我缺乏按值分组.特别:

TemplateSyntaxError at /
Caught an exception while rendering: column "links_link.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...karma_delta) - 1) / POW(2, 1.5)) AS "popularity", "links_lin...

Run Code Online (Sandbox Code Playgroud)

不知道为什么我的links_link.id不会在我的小组中,但我不知道如何改变我的小组,django通常会这样做.

Answer 1

Joh*_*ebs 9

在黑客新闻中,只有210个最新故事和210个最受欢迎的故事被分页(7页,每页30个故事).我的猜测是限制的原因(至少部分)是这个问题.

为什么不放弃所有最流行的故事SQL,而只是保留一个运行列表呢？一旦你建立了前210个故事的列表,你只需要担心在新的投票进入时重新排序,因为相对的订单会随着时间的推移而保持.当新的投票进入时,您只需要担心重新订阅收到投票的故事.

如果收到的投票故事是不是在名单上,计算出故事的成绩,再加上最不受欢迎的故事,是在名单上.如果收到投票的故事较低,那么你就完成了.如果它更高,则计算第二个到最不受欢迎的故事(故事209)的当前分数并再次进行比较.继续努力,直到找到一个得分较高的故事,然后将新投票的故事放在排名的正下方.当然,除非它达到#1.

这种方法的好处是它限制了您必须查看的故事集,以找出最佳故事列表.在绝对最坏的情况下,您必须计算211个故事的排名.因此,除非您必须从现有数据集建立列表,否则它非常有效 - 但这只是一次性的惩罚,假设您将列表缓存到某个位置.

唐氏投票是另一个问题,但我只能投票(无论如何,在我的业力水平).

归档时间：	15 年，8 月前
查看次数：	3636 次
最近记录：	13 年，2 月前