我想使用Mechanize 来抓取这个网页.表单元素如下所示:
<form name="ctl00" method="post" action="PSearchResults.aspx?state=ME&rp=" id="ctl00">
<div>
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="verylongstring" /> </div>
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWAgKb7POZAwK4v7ffCOmari00yJft/iuZBMdOH/zh9TDI" />
</div>
</form>
Run Code Online (Sandbox Code Playgroud)
我正在使用Mechanize打印控件,但它只能看到其中两个.如果我运行这个:
br.select_form(name='ctl00')
br.form.set_all_readonly(False) # allow changing the .value of all controls
for control in br.form.controls:
if not control.name:
print " - (type) =", (control.type)
continue
print " - (name, type, value) =", (control.name, control.type, br[control.name])
Run Code Online (Sandbox Code Playgroud)
所有被打印的是这样的:
- (name, type, value) = ('__VIEWSTATE', 'hidden', '/wEPDwUGNDQ5NTMwD2QWAgIBD2QWAgIHD2QWCgIBDw8WAh4E...more
- (name, type, value) = ('__EVENTVALIDATION', 'hidden', '/wEWAgKb7POZAwK4v7ffCOmari00yJft/iuZBMdOH/zh9TDI')
Run Code Online (Sandbox Code Playgroud)
为什么机械化不能"看到"__EVENTTARGET和__EVENTARGUMENT字段?
该网站正在检查使用者并提供不同的页面以进行机械化
指定这个作为useragent似乎工作正常
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6
Run Code Online (Sandbox Code Playgroud)
以下是显示如何使用mechanize设置User-Agent 的链接
作为后续,我使用mechanize(python)遇到了同样的问题,我尝试将UserAgent定义为
br.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 5.2; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.47 Safari/536.11')]
Run Code Online (Sandbox Code Playgroud)
根据网站的建议:http://stockrt.github.com/p/emulating-a-browser-in-python-with-mechanize/
但是,这不起作用所以我选择使用以下代码包含缺少的表单元素:
br.select_form(name='form')
br.form.set_all_readonly(False) # allow changing the .value of all controls
br.form.new_control('text','__EVENTARGUMENT',{'value':''})
br.form.new_control('text','__EVENTTARGET',{'value':''})
br.form.fixup()
br["__EVENTTARGET"] = 'lbSearch'
br["__EVENTARGUMENT"] = ''
Run Code Online (Sandbox Code Playgroud)