Edw*_*rán 5 python json python-3.x
我正在尝试从政府的 API 中提取数据。该 API 分为多个页面,每页有 10 个观察结果。我编写了一个算法,可以从每个观察中获取重要信息并将其添加到 pandas 数据框中。一切都很顺利,直到我到达迭代 29,此时我收到了标题中提到的错误。
\n这是我写的代码:
\n#Database Creation Using API\n#Import Libraries\nimport requests\nimport pandas as pd\n\n#Define a list of relevant variables to automatize information acquisition\nrelevant_vars = ["year","ocid","date","region","title","description","suppliers","buyer","amount"\\\n ,"budget"]#Define a list of relevant variables to automatize information acquisition\n\n#Creation of empty Pandas Dataframe to save all the pertinent information from the database.\ndata_collected = pd.DataFrame(columns = relevant_vars)\n\n#Access to API's data\n#API number 1: "B\xc3\xbasqueda de procesos de contrataci\xc3\xb3n por medio de palabra"\n\n#Need an initial response to start while loop\ndef firstResponse():\n url_t = "https://datosabiertos.compraspublicas.gob.ec/PLATAFORMA/api/search_ocds"\n payload = {"year":"2015","page":"2"}\n r = requests.get(url_t,params = payload).json()\n return r\n\n#Individual information saver.\ndef infoSave(variables,item):\n rp = firstResponse()\n temp = []\n for i in variables:\n i = rp["data"][item][str(i)]\n temp.append(i) \n return temp \n\n#Information gatherer\ndef infoGet(yr,url,obs=0):\n rp = dict.copy(firstResponse())\n observations = 0\n page_count = 0\n debug_count = 0\n #If no observations parameter is set, automatically gather all available data for that year.\n #Make all the API calls for the specific year (each page represents a call)\n while rp["pages"] - rp["page"] > 1:\n page_count = page_count + 1\n print(page_count)\n url_n = url \n payload = {"page":str(page_count),"year":str(yr)}\n rp = requests.get(url_n,params=payload).json()\n #Now that the call has been made, save this information in many variables.\n for item in range(len(rp["data"])):\n debug_count = debug_count + 1\n print(f"Iteration no.{debug_count}"+str(infoSave(relevant_vars,item)))\n year, ocid, date, region, title, description, suppliers, buyer, amount, budget = infoSave(relevant_vars,item)\n #After storing the information in the variables, append it to the pandas dataframe\n final_dataframe = data_collected.append({"year":year,"ocid":ocid,"date":date,\\\n "region":region,"title":title,\\\n "description":description,\\\n "suppliers":suppliers,"buyer":buyer,\\\n "amount":amount,"budget":budget},ignore_index \\\n = True)\n observations = observations + 1\n if obs == 0:\n pass\n elif observations == obs:\n break\nRun Code Online (Sandbox Code Playgroud)\n然后我尝试运行 infoGet 方法:
\ninfoGet(2015,"https://datosabiertos.compraspublicas.gob.ec/PLATAFORMA/api/search_ocds",obs=10)\nRun Code Online (Sandbox Code Playgroud)\n它运行完美,直到迭代 29,当我收到以下错误消息时:
\nJSONDecodeError Traceback (most recent call last)\n~\\AppData\\Local\\Temp/ipykernel_18876/3797698545.py in <module>\n 1 #Extract the required information from API\n----> 2 infoGet(2015,"https://datosabiertos.compraspublicas.gob.ec/PLATAFORMA/api/search_ocds",obs=10)\n\n~\\AppData\\Local\\Temp/ipykernel_18876/506738801.py in infoGet(yr, url, obs)\n 47 for item in range(len(rp["data"])):\n 48 debug_count = debug_count + 1\n---> 49 print(f"Iteration no.{debug_count}"+str(infoSave(relevant_vars,item)))\n 50 year, ocid, date, region, title, description, suppliers, buyer, amount, budget = infoSave(relevant_vars,item)\n 51 #After storing the information in the variables, append it to the pandas dataframe\n\n~\\AppData\\Local\\Temp/ipykernel_18876/506738801.py in infoSave(variables, item)\n 23 #Individual information saver.\n 24 def infoSave(variables,item):\n---> 25 rp = firstResponse()\n 26 temp = []\n 27 for i in variables:\n\n~\\AppData\\Local\\Temp/ipykernel_18876/506738801.py in firstResponse()\n 18 url_t = "https://datosabiertos.compraspublicas.gob.ec/PLATAFORMA/api/search_ocds"\n 19 payload = {"year":"2015","page":"2"}\n---> 20 r = requests.get(url_t,params = payload).json()\n 21 return r\n 22 \n\nD:\\ProgramData\\lib\\site-packages\\requests\\models.py in json(self, **kwargs)\n 908 # used.\n 909 pass\n--> 910 return complexjson.loads(self.text, **kwargs)\n 911 \n 912 @property\n\nD:\\ProgramData\\lib\\json\\__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)\n 344 parse_int is None and parse_float is None and\n 345 parse_constant is None and object_pairs_hook is None and not kw):\n--> 346 return _default_decoder.decode(s)\n 347 if cls is None:\n 348 cls = JSONDecoder\n\nD:\\ProgramData\\lib\\json\\decoder.py in decode(self, s, _w)\n 335 \n 336 """\n--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())\n 338 end = _w(s, end).end()\n 339 if end != len(s):\n\nD:\\ProgramData\\lib\\json\\decoder.py in raw_decode(self, s, idx)\n 353 obj, end = self.scan_once(s, idx)\n 354 except StopIteration as err:\n--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None\n 356 return obj, end\n\nJSONDecodeError: Expecting value: line 1 column 1 (char 0)\nRun Code Online (Sandbox Code Playgroud)\n如果有人能解释一下为什么我在得到这个观察结果时会收到此错误代码,我将非常感激。我尝试只获取该观察结果,并且效果很好:它具有与其他观察结果完全相同的数据量,并且是完全相同类型的对象。
\n感谢您的帮助!
\n您收到此错误的原因的答案是
服务器的响应不是有效的 JSON,您正在将其解析为 JSON
没有收到 JSON 格式的响应的原因可能有很多。在这种情况下,它是服务器中的速率限制。您无法无延迟地循环调用,因为服务器仅允许来自同一 IP 的某些请求突发限制。
要解决此问题,您可以sleep在每次调用后添加一个条件,增加分页大小(如果 API 允许)。您需要了解此 API 的速率限制和其他限制。
另外,您应该始终检查响应 HTTP 状态。理想的响应总是有一个HTTP 20X状态,并且是 API 规范中商定的 JSON。同时,您可能会收到类似HTTP 429 Too Many Requests或 的代码HTTP 403,或者可能有不同的代码,其中响应可能不是 JSON。
避免盲目接受 JSON 格式的内容。使用响应对象来检查和更新您的代码。一个例子就像
response = requests.get(url_n, params=payload)
if response.status_code == 200:
rp = response.json()
else:
print("Error from server: " + str(response.content))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
40011 次 |
| 最近记录: |