Python中更有效的循环

Question

Python中更有效的循环

mic*_*imo 0 python django performance nested-loops

我有这样的情况,我需要循环遍历两个对象列表并找到相等,然后循环其字段并更改一些属性.看起来像这样

for new_product in products_and_articles['products']:
  for old_product in products_for_update:
    if new_product.article == old_product.article:
      for old_field in old_product._meta.get_all_field_names():
        for new_field in new_product._meta.get_all_field_names():
          if old_field == new_field and old_field != 'id' and old_field != 'slug':
            setattr(old_product, old_field, getattr(new_product, old_field))

Run Code Online (Sandbox Code Playgroud)

显然,这远非好或甚至不可接受.所以我正在寻求建议如何避免这么多循环并增强算法

Answer 1

Ruf*_*ind 5

如果您将流程分解为逻辑可重用的部分,它会有所帮助.

for new_product in products_and_articles['products']:
  for old_product in products_for_update:
    if new_product.article == old_product.article:
      …

Run Code Online (Sandbox Code Playgroud)

例如,您正在做的是找到与特定产品匹配的产品article.既然article是独特的,我们可以这样写:

def find_products_by_article(products, article):
  '''Find all products that match the given article.  Returns
  either a product or 'None' if it doesn't exist.'''
  for products in products:
    return product

Run Code Online (Sandbox Code Playgroud)

然后用:

for old_product in products_for_update:
  new_products = find_products_by_article(
                   products_and_articles['products'],
                   old_product.article)
  …

Run Code Online (Sandbox Code Playgroud)

但是,这可能是很多更有效的,如果我们能够采取的是一种数据结构的优势,对查找窗口优化,即dict(恒定的,而不是线性复杂度).所以我们可以做的是:

# build a dictionary that stores products indexed by article
products_by_article = dict(product.article, product for product in
                           products_and_articles['products'])

for old_product in products_for_update:
  try:
    # look up product in the dictionary
    new_product = products_by_article[old_product.article]
  except KeyError:
    # silently ignore products that don't exist
    continue
  …

Run Code Online (Sandbox Code Playgroud)

如果您经常进行此类查找,最好在products_by_article其他地方重用字典,而不是每次都从头开始构建. 但请注意:如果您使用产品记录的多个表示,则需要使它们始终保持同步!

对于内部循环,请注意new_field这里仅用于检查字段是否存在:

…
  for old_field in old_product._meta.get_all_field_names():
    for new_field in new_product._meta.get_all_field_names():
      if old_field == new_field and old_field != 'id' and old_field != 'slug':
        setattr(old_product, old_field, getattr(new_product, old_field))

Run Code Online (Sandbox Code Playgroud)

(请注意,这有点可疑:任何尚未存在的新字段都会old_product被静默丢弃:这是故意的吗？)

这可以重新打包如下:

def transfer_fields(old, new, exclusions=('id', 'slug')):
  '''Update all pre-existing fields in the old record to have
  the same values as the new record.  The 'exclusions' parameter
  can be used to exclude certain fields from being updated.'''
  # use a set here for efficiency reasons
  fields = frozenset(old._meta.get_all_field_names())
  fields.difference_update(new._meta.get_all_field_names())
  fields.difference_update(exclusions)
  for field in fields:
    setattr(old, field, getattr(new, field))

Run Code Online (Sandbox Code Playgroud)

将所有这些放在一起:

# dictionary of products indexed by article
products_by_article = dict(product.article, product for product in
                           products_and_articles['products'])

for old_product in products_for_update:
  try:
    new_product = products_by_article[old_product.article]
  except KeyError:
    continue          # ignore non-existent products
  transfer_fields(old_product, new_product)

Run Code Online (Sandbox Code Playgroud)

最终代码的时间复杂度为O(n × k),n产品k数量和字段数.

归档时间：	10 年，11 月前
查看次数：	199 次
最近记录：	10 年，11 月前