在python中遍历复杂字典的更简单方法?

tra*_*mot 0 python iteration dictionary

我正在迭代一个复杂的json对象作为字典加载到python中.下面是json文件的示例.感兴趣的数据被评论.

{  
   "name":"ns1:timeSeriesResponseType",
   "nil":false,
   "value":{  
      "queryInfo":{  },
      "timeSeries":[  
         {  
            "variable":{  },
            "values":[  
               {  
                  "qualifier":[  ],
                  "censorCode":[  ],
                  "value":[  
                     {  
                        "codedVocabularyTerm":null,
                        "censorCode":null,
                        "offsetTypeID":null,
                        "accuracyStdDev":null,
                        "timeOffset":null,
                        "qualifiers":[  
                           "P",                      # data of interest
                           "Ice"                     # data of interest
                        ],
                        "qualityControlLevelCode":null,
                        "sampleID":null,
                        "dateTimeAccuracyCd":null,
                        "methodCode":null,
                        "codedVocabulary":null,
                        "sourceID":null,
                        "oid":null,
                        "dateTimeUTC":null,
                        "offsetValue":null,
                        "metadataTime":null,
                        "labSampleCode":null,
                        "methodID":null,
                        "value":"-999999",
                        "dateTime":"2015-02-24T03:30:00.000-05:00",
                        "offsetTypeCode":null,
                        "sourceCode":null
                     },
                     {  
                        "codedVocabularyTerm":null,
                        "censorCode":null,
                        "offsetTypeID":null,
                        "accuracyStdDev":null,
                        "timeOffset":null,
                        "qualifiers":[  ],
                        "qualityControlLevelCode":null,
                        "sampleID":null,
                        "dateTimeAccuracyCd":null,
                        "methodCode":null,
                        "codedVocabulary":null,
                        "sourceID":null,
                        "oid":null,
                        "dateTimeUTC":null,
                        "offsetValue":null,
                        "metadataTime":null,
                        "labSampleCode":null,
                        "methodID":null,
                        "value":"-999999",                          # data of interest
                        "dateTime":"2015-02-24T04:00:00.000-05:00", # data of interest
                        "offsetTypeCode":null,
                        "sourceCode":null
                     }
                  ],
                  "sample":[  ],
                  "source":[  ],
                  "offset":[  ],
                  "units":null,
                  "qualityControlLevel":[  ],
                  "method":[  ]
               }
            ],
            "sourceInfo":{  },
            "name":"USGS:03193000:00060:00011"
         },
         {  },  # more data need is stored in here
         {  },  # more data need is stored in here
         {  }   # more data need is stored in here
      ]
   },
   "declaredType":"org.cuahsi.waterml.TimeSeriesResponseType",
   "scope":"javax.xml.bind.JAXBElement$GlobalScope",
   "globalScope":true,
   "typeSubstituted":false
}
Run Code Online (Sandbox Code Playgroud)

这是我的代码,用于单步执行/迭代字典以获取我想要的数据并将其存储在更简单格式的字典中:

# Setting up blank variables to store results
outputDict = {}
outputList = []
dateTimeList = []
valueList = []
qualifiersList = [[]]


for key in result["value"]["timeSeries"]:
    for key2 in key:
        if key2 == "values":
            for key3 in key.get(key2):
                for key4 in key3:
                    if key4 == "value":
                        for key5 in key3.get(key4):
                            for key6 in key5:
                                if key6 == "value":
                                    valueList.append(key5.get(key6))
                                if key6 == "dateTime":
                                    dateTimeList.append(key5.get(key6))
                        #print key.get("name")
                        #outputDict[key.get("name")]["dateTime"] = dateTimeList
                        #outputDict[key.get("name")]["values"] = valueList

        if key2 == "name":
            outputList.append(key.get(key2))
            outputDict[key.get(key2)]={"dateTime":None, "values":None, "qualifiers":None}
            outputDict[key.get("name")]["dateTime"] = dateTimeList
            outputDict[key.get("name")]["values"] = valueList
            del dateTimeList[:]
            del valueList[:]
Run Code Online (Sandbox Code Playgroud)

我的问题是 - 对于python来说有点新,有人能指出代码中任何明显的低效率吗?我可以指望json文件不会改变结构几个月 - 可能是几年 - 所以我相信我最初在结果中使用for键["value"] ["timeSeries"]:很好,但我'我不确定是否有许多for循环是不必要的或低效的.是否有一种简单的方法可以搜索并返回密钥:来自这样一个分层字典的值对,以及字典列表中的字典列表?

编辑:

基于@Alex Martelli提供的解决方案,这里是代码的新的,更有效的,修剪版本:

# Building the output dictionary
for key in result["value"]["timeSeries"]:
    if "values" in key:
        for key2 in key.get("values"):
            if "value" in key2:
                for key3 in key2.get("value"):
                    if "value" in key3:
                        valueList.append(key3.get("value"))
                    if "dateTime" in key3:
                        dateTimeList.append(key3.get("dateTime"))
                    if "qualifiers" in key3:
                        qualifiersList.append(key3.get("qualifiers"))

    if "name" in key:
        outputList.append(key.get("name"))
        outputDict[key.get("name")]={"dateTime":None, "values":None, "qualifiers":None}
        outputDict[key.get("name")]["dateTime"] = dateTimeList[:]    # passing the items in the list rather
        outputDict[key.get("name")]["values"] = valueList[:]         # than a reference to the list so the delete works
        outputDict[key.get("name")]["qualifiers"] = qualifiersList[:]         # than a reference to the list so the delete works
        del dateTimeList[:]
        del valueList[:]
        del qualifiersList[:]
Run Code Online (Sandbox Code Playgroud)

工作原理相同,删除了4行代码.更快的运行时间.尼斯.

编辑:

根据@ Two-Bit Alchemist提出的解决方案,这也有效:

# Building the output dictionary
    for key in result["value"]["timeSeries"]:
        print key
        for value in key["values"][0]["value"]:
            # qualifiers is a list containing ["P", "Ice"]
            qualifiersList.append(value['qualifiers'])
            valueList.append(value['value'])
            dateTimeList.append(value['dateTime'])


        if "name" in key:
            outputList.append(key.get("name"))
            outputDict[key.get("name")]={"dateTime":None, "values":None, "qualifiers":None}
            outputDict[key.get("name")]["dateTime"] = dateTimeList[:]    # passing the items in the list rather
            outputDict[key.get("name")]["values"] = valueList[:]         # than a reference to the list so the delete works
            outputDict[key.get("name")]["qualifiers"] = qualifiersList[:]         # than a reference to the list so the delete works
            del dateTimeList[:]
            del valueList[:]
            del qualifiersList[:]
Run Code Online (Sandbox Code Playgroud)

我看到的唯一问题是我从未完全确定["values"]列表中的第一个位置是我想要的.并且我丢失了"if"语句提供的检查,检查应该确保如果从错误的查询语句返回值,则不会引入错误.

编辑:

try:

    # requests.get returns a "file-like" object
    # in this case it is a JSON object because of the settings in the query
    response = requests.get(url=query)


    # if-else ladder that only performs the parsing of the returned JSON object
    # when the HTTP status code indicates a successful query execution
    if(response.status_code == 200):

        # parsing the
        result = response.json()

        # Setting up blank variables to store results
        outputDict = {}
        outputList = []
        dateTimeList = []
        valueList = []
        qualifiersList = []


        # Building the output dictionary
        for key in result["value"]["timeSeries"]:
            print key
            for value in key["values"][0]["value"]:
                # qualifiers is a list containing ["P", "Ice"]
                qualifiersList.append(value['qualifiers'])
                valueList.append(value['value'])
                dateTimeList.append(value['dateTime'])

            # OLD CODE   
            # if "values" in key:
            #     for key2 in key.get("values"):
            #         if "value" in key2:
            #             for key3 in key2.get("value"):
            #                 if "value" in key3:
            #                     valueList.append(key3.get("value"))
            #                 if "dateTime" in key3:
            #                     dateTimeList.append(key3.get("dateTime"))
            #                 if "qualifiers" in key3:
            #                     qualifiersList.append(key3.get("qualifiers"))

            if "name" in key:
                outputList.append(key.get("name"))
                outputDict[key.get("name")]={"dateTime":None, "values":None, "qualifiers":None}
                outputDict[key.get("name")]["dateTime"] = dateTimeList[:]    # passing the items in the list rather
                outputDict[key.get("name")]["values"] = valueList[:]         # than a reference to the list so the delete works
                outputDict[key.get("name")]["qualifiers"] = qualifiersList[:]         # than a reference to the list so the delete works
                del dateTimeList[:]
                del valueList[:]
                del qualifiersList[:]


        # Tracking how long it took to process the data
        elapsed = time.time() - now
        print "Runtime: " + str(elapsed)

        out = {"Status": 'ok', "Results": [[{"myResult": outputDict}]]}

    elif(response.status_code == 400):
        raise Exception("Bad Request, "+ datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
    elif(response.status_code== 403):
        raise Exception("Access Forbidden, "+ datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
    elif(response.status_code == 404):
        raise Exception("Gage location(s) not Found, "+ datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
    elif(response.status_code == 500):
        raise Exception("Internal Server Error, "+ datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
    elif(response.status_code == 503):
        raise Exception("Service Unavailable, "+ datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
    else:
        raise Exception("Unknown Response, "+ datetime.now().strftime('%Y-%m-%d %H:%M:%S'))



except:
    out = {"Status": 'Error', "Message": str(sys.exc_info()[1])}


print out
Run Code Online (Sandbox Code Playgroud)

Ale*_*lli 5

你问"我的代码中任何明显的低效率" - 答案是肯定的,特别是在你循环遍历字典的地方(因此顺序获取所有密钥O(N),也就是说,花费的时间与字典中的密钥数量成正比而不是仅仅使用它们作为词典(这需要时间O(1),即恒定的时间 - 也快).

所以例如你所拥有的

for key2 in key:
    if key2 == "values":
       ...use key.get(key2)...
    if key2 == "name":
       ...use key.get(key2)...
Run Code Online (Sandbox Code Playgroud)

你应该改为:

if 'values' in key:
   ...use key['values']...
if 'name' in key:
   ...use key['name']...
Run Code Online (Sandbox Code Playgroud)

更深入的类似结构.事物可以进一步优化,例如:

values = key.get('values')
if values is not None:
    ...use values...
name = key.get('name')
if name is not None:
    ...use name...
Run Code Online (Sandbox Code Playgroud)

避免重复索引(同样更深入).