使用Python CSV DictReader创建多层嵌套字典

Question

使用Python CSV DictReader创建多层嵌套字典

Ker*_*nic 3 python csv dictionary python-2.7

总的Python noob在这里，可能缺少明显的东西。我到处搜索，还没有找到解决方案，所以我想寻求帮助。

我正在尝试编写一个从大型csv文件构建嵌套字典的函数。输入文件的格式如下：

Product,Price,Cost,Brand,
blue widget,5,4,sony,
red widget,6,5,sony,
green widget,7,5,microsoft,
purple widget,7,6,microsoft,

Run Code Online (Sandbox Code Playgroud)

等等...

我需要的输出字典如下所示：

projects = { `<Brand>`: { `<Product>`: { 'Price': `<Price>`, 'Cost': `<Cost>` },},}

Run Code Online (Sandbox Code Playgroud)

但是很明显，许多不同的品牌都包含不同的产品。在输入文件中，数据按品牌名称的字母顺序排序，但是我知道，一旦执行DictReader，数据就会变得无序，因此，我绝对需要一种更好的方式来处理重复项。编写的if语句是多余的，也是不必要的。

这是到目前为止我无法使用的无用代码：

def build_dict(source_file):
  projects = {}
  headers = ['Product', 'Price', 'Cost', 'Brand']
  reader = csv.DictReader(open(source_file), fieldnames = headers, dialect = 'excel')
  current_brand = 'None'
  for row in reader:
    if Brand != current_brand:
      current_brand = Brand
    projects[Brand] = {Product: {'Price': Price, 'Cost': Cost}}
  return projects

source_file = 'merged.csv'
print build_dict(source_file)

Run Code Online (Sandbox Code Playgroud)

我当然已经在文件顶部导入了csv模块。

最好的方法是什么？我觉得我的路还很遥远，但是关于从CSV创建嵌套字典的信息很少，而且那里的示例非常具体，往往不会详细说明该解决方案为何有效，因此作为Python的新手，很难下结论。

另外，输入的csv文件通常没有标题，但是为了尝试获得此功能的有效版本，我手动插入了标题行。理想情况下，将有一些分配标头的代码。

非常感谢任何帮助/指导/建议，谢谢！

Answer 1

DSM*_*DSM 5

import csv
from collections import defaultdict

def build_dict(source_file):
    projects = defaultdict(dict)
    headers = ['Product', 'Price', 'Cost', 'Brand']
    with open(source_file, 'rb') as fp:
        reader = csv.DictReader(fp, fieldnames=headers, dialect='excel',
                                skipinitialspace=True)
        for rowdict in reader:
            if None in rowdict:
                del rowdict[None]
            brand = rowdict.pop("Brand")
            product = rowdict.pop("Product")
            projects[brand][product] = rowdict
    return dict(projects)

source_file = 'merged.csv'
print build_dict(source_file)

Run Code Online (Sandbox Code Playgroud)

产生

{'microsoft': {'green widget': {'Cost': '5', 'Price': '7'},
               'purple widget': {'Cost': '6', 'Price': '7'}},
 'sony': {'blue widget': {'Cost': '4', 'Price': '5'},
          'red widget': {'Cost': '5', 'Price': '6'}}}

Run Code Online (Sandbox Code Playgroud)

从您的输入数据（其中merged.csv没有标题，只有数据）。

我在defaultdict这里使用了a ，就像字典一样，但是当您引用一个不存在的键而不是引发Exception时，它只是创建一个默认值，在这种情况下是a dict。然后我出去-删除- Brand和Product，然后存储其余部分。

我认为剩下的就是将成本和价格转换为数字而不是字符串。

[修改为DictReader直接使用，而不是reader]

归档时间：	13 年，6 月前
查看次数：	4543 次
最近记录：	9 年，6 月前