为什么我在迭代 28 后收到错误:“JSONDecodeError:期望值:第 1 行第 1 列(字符 0)”?

Edw*_*rán 5 python json python-3.x

我正在尝试从政府的 API 中提取数据。该 API 分为多个页面,每页有 10 个观察结果。我编写了一个算法,可以从每个观察中获取重要信息并将其添加到 pandas 数据框中。一切都很顺利,直到我到达迭代 29,此时我收到了标题中提到的错误。

\n

这是我写的代码:

\n
#Database Creation Using API\n#Import Libraries\nimport requests\nimport pandas as pd\n\n#Define a list of relevant variables to automatize information acquisition\nrelevant_vars = ["year","ocid","date","region","title","description","suppliers","buyer","amount"\\\n                 ,"budget"]#Define a list of relevant variables to automatize information acquisition\n\n#Creation of empty Pandas Dataframe to save all the pertinent information from the database.\ndata_collected = pd.DataFrame(columns = relevant_vars)\n\n#Access to API's data\n#API number 1: "B\xc3\xbasqueda de procesos de contrataci\xc3\xb3n por medio de palabra"\n\n#Need an initial response to start while loop\ndef firstResponse():\n    url_t = "https://datosabiertos.compraspublicas.gob.ec/PLATAFORMA/api/search_ocds"\n    payload = {"year":"2015","page":"2"}\n    r = requests.get(url_t,params = payload).json()\n    return r\n\n#Individual information saver.\ndef infoSave(variables,item):\n    rp = firstResponse()\n    temp = []\n    for i in variables:\n        i = rp["data"][item][str(i)]\n        temp.append(i)   \n    return temp \n\n#Information gatherer\ndef infoGet(yr,url,obs=0):\n    rp = dict.copy(firstResponse())\n    observations = 0\n    page_count = 0\n    debug_count = 0\n    #If no observations parameter is set, automatically gather all available data for that year.\n    #Make all the API calls for the specific year (each page represents a call)\n    while rp["pages"] - rp["page"] > 1:\n        page_count = page_count + 1\n        print(page_count)\n        url_n = url \n        payload = {"page":str(page_count),"year":str(yr)}\n        rp = requests.get(url_n,params=payload).json()\n        #Now that the call has been made, save this information in many variables.\n        for item in range(len(rp["data"])):\n            debug_count = debug_count + 1\n            print(f"Iteration no.{debug_count}"+str(infoSave(relevant_vars,item)))\n            year, ocid, date, region, title, description, suppliers, buyer, amount, budget = infoSave(relevant_vars,item)\n            #After storing the information in the variables, append it to the pandas dataframe\n            final_dataframe = data_collected.append({"year":year,"ocid":ocid,"date":date,\\\n                                                    "region":region,"title":title,\\\n                                                    "description":description,\\\n                                                    "suppliers":suppliers,"buyer":buyer,\\\n                                                    "amount":amount,"budget":budget},ignore_index \\\n                                                    = True)\n        observations = observations + 1\n        if obs == 0:\n            pass\n        elif observations == obs:\n            break\n
Run Code Online (Sandbox Code Playgroud)\n

然后我尝试运行 infoGet 方法:

\n
infoGet(2015,"https://datosabiertos.compraspublicas.gob.ec/PLATAFORMA/api/search_ocds",obs=10)\n
Run Code Online (Sandbox Code Playgroud)\n

它运行完美,直到迭代 29,当我收到以下错误消息时:

\n
\n
JSONDecodeError                           Traceback (most recent call last)\n~\\AppData\\Local\\Temp/ipykernel_18876/3797698545.py in <module>\n      1 #Extract the required information from API\n----> 2 infoGet(2015,"https://datosabiertos.compraspublicas.gob.ec/PLATAFORMA/api/search_ocds",obs=10)\n\n~\\AppData\\Local\\Temp/ipykernel_18876/506738801.py in infoGet(yr, url, obs)\n     47         for item in range(len(rp["data"])):\n     48             debug_count = debug_count + 1\n---> 49             print(f"Iteration no.{debug_count}"+str(infoSave(relevant_vars,item)))\n     50             year, ocid, date, region, title, description, suppliers, buyer, amount, budget = infoSave(relevant_vars,item)\n     51             #After storing the information in the variables, append it to the pandas dataframe\n\n~\\AppData\\Local\\Temp/ipykernel_18876/506738801.py in infoSave(variables, item)\n     23 #Individual information saver.\n     24 def infoSave(variables,item):\n---> 25     rp = firstResponse()\n     26     temp = []\n     27     for i in variables:\n\n~\\AppData\\Local\\Temp/ipykernel_18876/506738801.py in firstResponse()\n     18     url_t = "https://datosabiertos.compraspublicas.gob.ec/PLATAFORMA/api/search_ocds"\n     19     payload = {"year":"2015","page":"2"}\n---> 20     r = requests.get(url_t,params = payload).json()\n     21     return r\n     22 \n\nD:\\ProgramData\\lib\\site-packages\\requests\\models.py in json(self, **kwargs)\n    908                     # used.\n    909                     pass\n--> 910         return complexjson.loads(self.text, **kwargs)\n    911 \n    912     @property\n\nD:\\ProgramData\\lib\\json\\__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)\n    344             parse_int is None and parse_float is None and\n    345             parse_constant is None and object_pairs_hook is None and not kw):\n--> 346         return _default_decoder.decode(s)\n    347     if cls is None:\n    348         cls = JSONDecoder\n\nD:\\ProgramData\\lib\\json\\decoder.py in decode(self, s, _w)\n    335 \n    336         """\n--> 337         obj, end = self.raw_decode(s, idx=_w(s, 0).end())\n    338         end = _w(s, end).end()\n    339         if end != len(s):\n\nD:\\ProgramData\\lib\\json\\decoder.py in raw_decode(self, s, idx)\n    353             obj, end = self.scan_once(s, idx)\n    354         except StopIteration as err:\n--> 355             raise JSONDecodeError("Expecting value", s, err.value) from None\n    356         return obj, end\n\nJSONDecodeError: Expecting value: line 1 column 1 (char 0)\n
Run Code Online (Sandbox Code Playgroud)\n

如果有人能解释一下为什么我在得到这个观察结果时会收到此错误代码,我将非常感激。我尝试只获取该观察结果,并且效果很好:它具有与其他观察结果完全相同的数据量,并且是完全相同类型的对象。

\n

感谢您的帮助!

\n

Kri*_*ris 4

您收到此错误的原因的答案是

服务器的响应不是有效的 JSON,您正在将其解析为 JSON

没有收到 JSON 格式的响应的原因可能有很多。在这种情况下,它是服务器中的速率限制。您无法无延迟地循环调用,因为服务器仅允许来自同一 IP 的某些请求突发限制。

要解决此问题,您可以sleep在每次调用后添加一个条件,增加分页大小(如果 API 允许)。您需要了解此 API 的速率限制和其他限制。

另外,您应该始终检查响应 HTTP 状态。理想的响应总是有一个HTTP 20X状态,并且是 API 规范中商定的 JSON。同时,您可能会收到类似HTTP 429 Too Many Requests或 的代码HTTP 403,或者可能有不同的代码,其中响应可能不是 JSON。

避免盲目接受 JSON 格式的内容。使用响应对象来检查和更新您的代码。一个例子就像

    response = requests.get(url_n, params=payload)
    if response.status_code == 200:
        rp = response.json()
    else:
        print("Error from server: " + str(response.content))
Run Code Online (Sandbox Code Playgroud)