tqdm 使我的程序速度至少减慢 8 倍

Thu*_*ark 1 python tqdm

为了提供一些上下文,我有一个转换器,它分两步请求 JSON 数据,第一步是请求完整数量的 Data_points,第二步是请求每个数据点的详细信息。为了跟踪进度,因为我期望从此时开始请求大量数据。这就是我使用 tqdm 的原因,它确实使我的程序执行速度减慢了至少 8 倍。

\n
import requests\nimport json\nimport os\nimport time\nfrom datetime import timedelta\nfrom datetime import datetime\nfrom datetime import date\nimport pandas as pd\nimport shutil\nimport zipfile\nimport smtplib, ssl\nfrom progress.bar import Bar\nfrom tqdm import tqdm\nfrom time import sleep\n
Run Code Online (Sandbox Code Playgroud)\n

这是代码:

\n
def fetch_data_points(url: str):\n    limit_request = 100\n    # Placeholder for limit: please do not remove = 1000000000 -JJ\n    folder_path_reset("api_request_jsons","csv","Geographic_information")\n    total_start_time = start_time_measure()\n    start_time = start_time_measure(\n        \'Starting Phase 1: First request from API: Data Points\')\n\n    for i in tqdm(range(limit_request)):\n        response = requests.get(url,params={"limit": limit_request})\n    API_status_report(response)\n    end_time_measure(total_start_time, "Request completed: ")\n    end_time_measure(total_start_time, "End of Phase 1, completed in: ")\n    return response.json()\n
Run Code Online (Sandbox Code Playgroud)\n

请记下此处的时间:

\n

这是使用 tqdm 的控制台。

\n
Starting Phase 1: First request from API: Data Points\n100%|\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 100/100 [00:21<00:00,  4.69it/s]Successfull connection!\n\nRequest completed: 0:00:21.359000\nEnd of Phase 1, completed in: 0:00:21.359000\nSaving points\nExported_data\\api_request_jsons\\Fetch_points\\Points.json saved\nPoint saved: 0:00:00.016000\nData saved. Total time of program run: 0:00:00.016000\nStarting Phase 2: Second request from API: 100 requested\n  9%|\xe2\x96\x89         | 9/100 [02:12<22:17, 14.69s/it]\n
Run Code Online (Sandbox Code Playgroud)\n

这是没有 tqdm 的控制台:

\n
Starting Phase 1: First request from API: Data Points\nSuccessfull connection!\nRequest completed: 0:00:00.297000\nEnd of Phase 1, completed in: 0:00:00.297000\nSaving points\nExported_data\\api_request_jsons\\Fetch_points\\Points.json saved\nPoint saved: 0:00:00.015000\nData saved. Total time of program run: 0:00:00.015000\nStarting Phase 2: Second request from API: 100 requested\n 10%|\xe2\x96\x88         | 10/100 [01:54<16:52, 11.25s/it]\n
Run Code Online (Sandbox Code Playgroud)\n

在此输入图像描述

\n

正如您在这里看到的,程序速度几乎减慢了十倍。100 点的请求通常需要 00.297000 秒。但对于 TQDM,则为 0:00:21.359000。比应有的速度慢五倍以上。我预计会慢两倍,但五倍有点太多了..任何人都可以给我一些指示,以尽可能减少这种减慢。

\n

编辑:好吧,我决定放弃第一个功能上的 tqdm 措施,我只是无法正确完成它。这需要太多的调整,我注意到当我调整请求的数据量时,数据明显不一致。

\n

所以我尝试了第二个函数,相关的代码是这样的:

\n

为了解释它,请求每个数据点的详细信息并将它们放入数组中以供以后使用:

\n
def fetch_details_of_data_points(url: str):\n    input_json = fetch_data_points(url)\n    fetch_points_save(input_json)\n    all_location_points_details = []\n    amount_of_objects = len(input_json)\n    total_start_time = start_time_measure()\n    start_time = start_time_measure(f\'Starting Phase 2: Second request from API: {str(amount_of_objects)} requested\')\n    for i in tqdm(range(amount_of_objects),miniters=1):\n        for obj in input_json:\n
Run Code Online (Sandbox Code Playgroud)\n

all_location_points_details.append(fetch_details(obj.get("detail")))

\n
def fetch_details(url: str):\n    response = requests.get(url)\n    # Makes request call to get the data of detail\n    # save_file(folder_path,GipodId,text2)\n    # any other processe\n    return response.json()\n
Run Code Online (Sandbox Code Playgroud)\n

但我在这里收到错误:

\n
 Message=(\'Connection aborted.\', RemoteDisconnected(\'Remote end closed connection without response\'))\n  Source=C:\\Users\\xxxxxx\\GIPOD_REQUEST_CONVERSION.py\n  StackTrace:\n  File "C:\\Users\\QF6207\\xxxxxx\\GIPOD_REQUEST_CONVERSION.py", line 195, in fetch_details\n    response = requests.get(url)\n  File "C:\\Users\\xxxxxx\\GIPOD_REQUEST_CONVERSION.py", line 361, in fetch_details_of_data_points\n    all_location_points_details.append(fetch_details(obj.get("detail")))\n  File "C:\\Users\\xxxxxx\\GIPOD_REQUEST_CONVERSION.py", line 446, in <module> (Current frame)\n    fetch_details_of_data_points(api_response_url)\n
Run Code Online (Sandbox Code Playgroud)\n

据我所知,显然对一个数据点的请求花费的时间太长,导致发生断开连接。 \n值得注意的是,我从经验中知道,对一个数据点的请求大约需要 0.25 秒才能请求数据。因此,理论上进度条应该更新并以 1 为增量计数,并且每 0.25 秒更新一次。

\n

现在,如果可以通过将 get 命令的响应时间作为更新时间来解决这个问题,这将有助于使进度条更加准确。

\n

那么我该怎么做呢?

\n

编辑:我已经找到了解决我的问题的方法,实际上没有做太多延迟,在通读之后,我找到了一种在功能完成后进行手动更新的创造性方法。

\n
with tqdm (total=limit) as firstrequest:\n    all_location_points_details = fetch_details_of_data_points(url,limit)\n    firstrequest.update(limit)\n\n\nwith tqdm(total=amount_of_objects) as second_request:\n    for obj in input_json: \n        all_location_points_details.append(fetch_details(obj.get("detail")))\n        second_request.update(1)\n
Run Code Online (Sandbox Code Playgroud)\n

小智 5

来自tqdm 文档

minters : int 或 float,可选

最小进度显示更新间隔(以迭代为单位)。如果为0且dynamic_minters,将自动调整为相等的mininterval(CPU效率更高,有利于紧密循环)。如果 > 0,将跳过指定迭代次数的显示。调整这个和 mininterval 以获得非常有效的循环。如果你的进度在快速和慢速迭代(网络、跳过项目等)中都不稳定,你应该设置 minters=1

减少这个参数可以加快速度。