在关于行顺序的多个字段中搜索

Question

在关于行顺序的多个字段中搜索

ayh*_*han 9 python django django-orm django-queryset

我有一个如下模型:

class Foo(models.Model):
    fruit = models.CharField(max_length=10)
    stuff = models.CharField(max_length=10)
    color = models.CharField(max_length=10)
    owner = models.CharField(max_length=20)
    exists = models.BooleanField()
    class Meta:
        unique_together = (('fruit', 'stuff', 'color'), )

Run Code Online (Sandbox Code Playgroud)

它填充了一些数据:

fruit  stuff  color   owner  exists
Apple  Table   Blue     abc    True
 Pear   Book    Red     xyz   False
 Pear  Phone  Green     xyz   False
Apple  Phone   Blue     abc    True
 Pear  Table  Green     abc    True

Run Code Online (Sandbox Code Playgroud)

我需要将它与一个集合(不是查询集)合并/连接:

[('Apple', 'Table', 'Blue'), ('Pear', 'Phone', 'Green')]

Run Code Online (Sandbox Code Playgroud)

因此,当我使用此元组列表搜索此模型时,基本上应返回行0和2.

目前我的解决方法是读Foo.objects.all()入一个DataFrame并与元组列表合并并获取要传递给的ID Foo.objects.filter().我也尝试迭代列表并调用Foo.object.get()每个元组,但它非常慢.名单很大.

当我按照当前答案的建议尝试链接Q时,它抛出了一个OperationalError(太多的SQL变量).

我的主要目标如下:

从模型中可以看出,这三个字段共同构成了我的主键.该表包含大约15k个条目.当我从另一个源获取数据时,我需要检查数据是否已经存在于我的表中并相应地创建/更新/删除(新数据可能包含多达15k个条目).有没有一种干净有效的方法来检查这些记录是否已经在我的表中？

注意:元组列表不必是那种形状.我可以修改它,将其转换为另一个数据结构或转置它.

Answer 1

Sat*_*dra 5

你有('fruit', 'stuff', 'color')一个独特的领域

因此,如果您的搜索元组是('Apple', 'Table', 'Blue')并且我们连接它,那么它也将是一个唯一的字符串

f = [('Apple', 'Table', 'Blue'), ('Pear', 'Phone', 'Green')]
c = [''.join(w) for w in f]
# Output: ['AppleTableBlue', 'PearPhoneGreen']

Run Code Online (Sandbox Code Playgroud)

所以我们可以在注释上过滤查询集并使用Concat.

Foo.objects.annotate(u_key=Concat('fruit', 'stuff', 'color', output_field=CharField())).filter(u_key__in=c)
# Output: <QuerySet [<Foo: #0row >, <Foo: #2row>]>

Run Code Online (Sandbox Code Playgroud)

这将适用于元组和列表

转置案例

情况1:

如果输入是2元组的列表:

[('Apple', 'Table', 'Blue'), ('Pear', 'Phone', 'Green')]

Run Code Online (Sandbox Code Playgroud)

转置输入后将是:

transpose_input = [('Apple', 'Pear'), ('Table', 'Phone'), ('Blue', 'Green')]

Run Code Online (Sandbox Code Playgroud)

我们可以通过计算输入转置的each_tuple_size和input_list_size来轻松识别.所以我们可以使用zip再次转置它,上面的解决方案将按预期工作.

if each_tuple_size == 2 and input_list_size == 3:
    transpose_again = list(zip(*transpose_input))
    #  use *transpose_again* variable further

Run Code Online (Sandbox Code Playgroud)

案例2:

如果输入是3元组的列表:

[('Apple', 'Table', 'Blue'), ('Pear', 'Phone', 'Green'), ('Pear', 'Book', 'Red')]

Run Code Online (Sandbox Code Playgroud)

转置输入后将是:

transpose_input = [('Apple', 'Pear', 'Pear'), ('Table', 'Phone', 'Book'), ('Blue', 'Green', 'Red')]

Run Code Online (Sandbox Code Playgroud)

因此,无法确定输入是否针对每个n*n矩阵进行转置, 并且上述解决方案将失败

Answer 2

sch*_*ggl 2

如果您知道这些字段构成您的自然键并且必须对它们进行大量查询，请将此自然键添加为适当的字段并采取措施维护它：

class FooQuerySet(models.QuerySet):
    def bulk_create(self, objs, batch_size=None):
        objs = list(objs)
        for obj in objs:
            obj.natural_key = Foo.get_natural_key(obj.fruit, obj.stuff, obj.color)
        return super(FooQuerySet, self).bulk_create(objs, batch_size=batch_size)

    # you might override update(...) with proper F and Value expressions, 
    # but I assume the natural key does not change

class FooManager(models.Manager):
    def get_queryset(self):
        return FooQuerySet(self.model, using=self._db)

class Foo(models.Model):
    NK_SEP = '|||'  # sth unlikely to occur in the other fields

    fruit = models.CharField(max_length=10)
    stuff = models.CharField(max_length=10)
    color = models.CharField(max_length=10)
    natural_key = models.CharField(max_length=40, unique=True, db_index=True)

    @staticmethod
    def get_natural_key(*args):
        return Foo.NK_SEP.join(args) 

    def save(self, *args, **kwargs):
        self.natural_key = Foo.get_natural_key(self.fruit, self.stuff, self.color)
        Super(Foo, self).save(*args, **kwargs)

    objects = FooManager()

    class Meta:
        unique_together = (('fruit', 'stuff', 'color'), )

Run Code Online (Sandbox Code Playgroud)

现在您可以查询：

from itertools import starmap

lst = [('Apple', 'Table', 'Blue'), ('Pear', 'Phone', 'Green')]
existing_foos = Foo.objects.filter(natural_key__in=list(starmap(Foo.get_natural_key, lst)))

Run Code Online (Sandbox Code Playgroud)

并批量创建：

Foo.objects.bulk_create(
    [
        Foo(fruit=x[0], stuff=x[1], color=x[2]) 
        for x in lst 
        if x not in set(existing_foos.values_list('fruit', 'stuff', 'color'))
    ]
)

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，7 月前
查看次数：	345 次
最近记录：	8 年，7 月前