Lau*_*ura 5 python selenium pool multiprocessing
我正在尝试使用多重处理,其想法是从 Bing 搜索结果中获取链接,但使用 selenium 更改其中一项配置(cep 配置)。我将所有 cep 都放在列表 (filecep) 中,并且我想将所有结果写入 csv 文件。\n这是我的 getUrlCleans 函数:
\ndef getUrlCleans(search):\n\n\ndriver = webdriver.Firefox()\n\nf = open('out/'+str(date.today())+'.csv','w')\nf.write('url,cep')\nf.write('\\n')\n\nurl_cleans=[] \n\npool=mp.Pool(mp.cpu_count())\npool.starmap(getUrlbyCEP,[(cep,driver,search,f) for cep in filecep])\npool.close()\nf.close()\n
Run Code Online (Sandbox Code Playgroud)\n这是我的 getUrlbyCEP 函数:
\ndef getUrlbyCEP(cep,driver,search,f):\n\ndriver.get('https://www.bing.com/account/general?ru=https%3a%2f%2fwww.bing.com%2f%3fFORM%3dZ9FD1&FORM=O2HV65#location')\n \n \ncepInput = driver.find_element_by_id('geoname')\ncepInput.clear()\ncepInput.send_keys(cep)\ntime.sleep(0.5)\ndriver.execute_script("window.scrollTo(0,document.body.scrollHeight)")\n\n\nsaveButon=driver.find_element_by_id('sv_btn')\nsaveButon.click()\n\n\n\n\ntry:\n driver.find_element_by_id('geoname') \n # continue\nexcept:\n pass\n\nsearchInput=driver.find_element_by_id('sb_form_q')\nsearchInput.send_keys(search)\n\ndriver.find_element_by_id('sb_form_q').send_keys(Keys.ENTER)\ntime.sleep(0.5)\n\nurl_cleans=[]\n\nfor i in range(2):\n \n url_cleans=getLinks(driver,url_cleans)\n time.sleep(2)\n driver.find_element_by_xpath('//*[@title="Pr\xc3\xb3xima p\xc3\xa1gina"]').click()\n url_cleans=getLinks(driver,url_cleans)\n for u in url_cleans:\n f.write(u+','+cep)\n f.write('\\n')\n\n \n
Run Code Online (Sandbox Code Playgroud)\n最后我打电话
\ngetUrlCleans('sulamerica')\n
Run Code Online (Sandbox Code Playgroud)\nang 它给了我错误......我不知道为什么?
\n因此,我没有使用多处理,而是使用了线程,并且它起作用了。这就是我改变的,而不是:
pool=mp.Pool(mp.cpu_count())
results = pool.starmap(getUrlbyCEP,[(cep,driver,search,f) for cep in filecep])
Run Code Online (Sandbox Code Playgroud)
来自多处理库(mp),我使用了这个:
from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
results = pool.map(f_partial, filecep)
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
20912 次 |
最近记录: |