通过Python脚本在网页中填写表单值(不测试)

Hab*_*iai 34 python

我需要在目标页面上填写表单值,然后通过Python单击一个按钮.我看过Selenium和Windmill,但这些都是测试框架 - 我没有测试.我正在尝试以编程方式登录第三方网站,然后下载并解析我们需要插入到我们数据库中的文件.测试框架的问题在于它们启动了浏览器实例; 我只想要一个我可以安排每天运行的脚本来检索我想要的页面.有什么办法吗?

Vin*_*vic 28

您正在寻找Mechanize

表格提交样本:

import re
from mechanize import Browser

br = Browser()
br.open("http://www.example.com/")
br.select_form(name="order")
# Browser passes through unknown attributes (including methods)
# to the selected HTMLForm (from ClientForm).
br["cheeses"] = ["mozzarella", "caerphilly"]  # (the method here is __setitem__)
response = br.submit()  # submit current form
Run Code Online (Sandbox Code Playgroud)


RAT*_*THI 18

看看这个使用Mechanize的例子:它将给出基本的想法:

#!/usr/bin/python
import re 
from mechanize import Browser
br = Browser()

# Ignore robots.txt
br.set_handle_robots( False )
# Google demands a user-agent that isn't a robot
br.addheaders = [('User-agent', 'Firefox')]

# Retrieve the Google home page, saving the response
br.open( "http://google.com" )

# Select the search box and search for 'foo'
br.select_form( 'f' )
br.form[ 'q' ] = 'foo'

# Get the search results
br.submit()

# Find the link to foofighters.com; why did we run a search?
resp = None
for link in br.links():
    siteMatch = re.compile( 'www.foofighters.com' ).search( link.url )
    if siteMatch:
        resp = br.follow_link( link )
        break

# Print the site
content = resp.get_data()
print content
Run Code Online (Sandbox Code Playgroud)


Clu*_*ess 8

您可以使用标准urllib库来执行以下操作:

import urllib

urllib.urlretrieve("http://www.google.com/", "somefile.html", lambda x,y,z:0, urllib.urlencode({"username": "xxx", "password": "pass"}))
Run Code Online (Sandbox Code Playgroud)