yak*_*yak 7 javascript python ajax mechanize mechanize-python
我需要从页面中删除一些数据,我填写表单(已经使用mechanize进行了此操作).问题是,页面在很多页面上返回数据,而且从这些页面获取数据我遇到了麻烦.
从第一个结果页面获取它们没有问题,因为它在搜索后已经显示 - 我只需提交表单并获得响应.
我分析了结果页面的源代码,似乎它使用了Java Script,RichFaces(JSF的一些lib和ajax,但我可能是错的,因为我不是网络专家).
但是,我设法弄清楚如何到达剩余的结果页面.我需要点击此表单中的链接(href="javascript:void(0);"以下完整代码):
<td class="pageNumber"><span class="rf-ds " id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233"><span class="rf-ds-nmb-btn rf-ds-act " id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_1">1</span><a class="rf-ds-nmb-btn " href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_2">2</a><a class="rf-ds-nmb-btn " href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_3">3</a><a class="rf-ds-nmb-btn " href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_4">4</a><a class="rf-ds-nmb-btn " href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_5">5</a><a class="rf-ds-nmb-btn " href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_6">6</a><a class="rf-ds-nmb-btn " href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_7">7</a><a class="rf-ds-nmb-btn " href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_8">8</a><a class="rf-ds-nmb-btn " href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_9">9</a><a class="rf-ds-nmb-btn " href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_10">10</a><a class="rf-ds-btn rf-ds-btn-next" href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_next">»</a><a class="rf-ds-btn rf-ds-btn-last" href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_l">»»»»</a>
<script type="text/javascript">new RichFaces.ui.DataScroller("SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233",function(event,element,data){RichFaces.ajax("SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233",event,{"parameters":{"SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233:page":data.page} ,"incId":"1"} )},{"digitals":{"SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_9":"9","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_8":"8","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_7":"7","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_6":"6","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_5":"5","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_4":"4","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_3":"3","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_1":"1","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_10":"10","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_2":"2"} ,"buttons":{"right":{"SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_next":"next","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_l":"last"} } ,"currentPage":1} )</script></span></td>
<td class="pageExport"><script type="text/javascript" src="/opi/javax.faces.resource/download.js?ln=js/component&b="></script><script type="text/javascript">
Run Code Online (Sandbox Code Playgroud)
所以我想问一下是否有办法点击所有链接并使用mechanize获取所有页面(注意,»符号之后有更多页面可用)?我用网络知识询问总傻瓜的答案:)
首先,我仍然会坚持使用 selenium,因为这是一个相当“javascript 重”的网站。请注意,您可以使用无头浏览器(PhantomJS或带有虚拟显示器请注意,如果需要,
这里的想法是每页按 100 行分页,单击“>>”链接直到它不再出现在页面上,这意味着我们已经到达最后一页并且没有更多结果需要处理。为了使解决方案可靠,我们需要使用显式等待:每次我们进入下一页时 - 等待加载旋转器不可见。
\n\n工作实施:
\n\n# -*- coding: utf-8 -*-\nfrom selenium.common.exceptions import NoSuchElementException\nfrom selenium.webdriver.common.by import By\nfrom selenium import webdriver\nfrom selenium.webdriver.support.select import Select\nfrom selenium.webdriver.support.wait import WebDriverWait\nfrom selenium.webdriver.support import expected_conditions as EC\n\ndriver = webdriver.Firefox()\ndriver.maximize_window()\n\ndriver.get(\'https://polon.nauka.gov.pl/opi/aa/drh/zestawienie?execution=e1s1\')\nwait = WebDriverWait(driver, 30)\n\n# paginate by 100\nselect = Select(driver.find_element_by_id("drhPageForm:drhPageTable:j_idt211:j_idt214:j_idt220"))\nselect.select_by_visible_text("100")\n\nwhile True:\n # wait until there is no loading spinner\n wait.until(EC.invisibility_of_element_located((By.ID, "loadingPopup_content_scroller")))\n\n current_page = driver.find_element_by_class_name("rf-ds-act").text\n print("Current page: %d" % current_page)\n\n # TODO: collect the results\n\n # proceed to the next page\n try:\n next_page = driver.find_element_by_link_text(u"\xc2\xbb")\n next_page.click()\n except NoSuchElementException:\n break\nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
2592 次 |
| 最近记录: |