Ada*_*tan 1 python scripting form-submit data-harvest
我使用的公交公司经营着一个糟糕的网站(希伯来语,英语),它制作了一个简单的"今天从A到B的时间表"查询噩梦.我怀疑他们正在努力鼓励使用昂贵的短信查询系统.
我正试图从网站收集整个时间表,通过提交每个可能点的查询到每个可能的点,这将总计大约10k查询.查询结果显示在弹出窗口中.我对网络编程很陌生,但熟悉python的基本方面.
谢谢!
gim*_*mel 10
Twill是一种用于Web浏览的简单脚本语言.它碰巧运行python api.
斜纹织物本质上是机械化包装周围的薄壳.所有twill命令都在commands.py文件中实现,而pyparsing则执行解析输入并将其转换为Python命令的工作(请参阅parse.py).交互式shell工作和readline支持是通过cmd模块(来自标准Python库)实现的.
从上面链接的文档"按下"提交的示例:
from twill.commands import go, showforms, formclear, fv, submit
go('http://issola.caltech.edu/~t/qwsgi/qwsgi-demo.cgi/')
go('./widgets')
showforms()
formclear('1')
fv("1", "name", "test")
fv("1", "password", "testpass")
fv("1", "confirm", "yes")
showforms()
submit('0')
Run Code Online (Sandbox Code Playgroud)
Geo*_*Geo 10
我建议你使用机械化.以下是其页面中的代码段,其中显示了如何提交表单:
import re
from mechanize import Browser
br = Browser()
br.open("http://www.example.com/")
# follow second link with element text matching regular expression
response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1)
assert br.viewing_html()
print br.title()
print response1.geturl()
print response1.info() # headers
print response1.read() # body
response1.close() # (shown for clarity; in fact Browser does this for you)
br.select_form(name="order")
# Browser passes through unknown attributes (including methods)
# to the selected HTMLForm (from ClientForm).
br["cheeses"] = ["mozzarella", "caerphilly"] # (the method here is __setitem__)
response2 = br.submit() # submit current form
# print currently selected form (don't call .submit() on this, use br.submit())
print br.form
Run Code Online (Sandbox Code Playgroud)
您很少想要"按下提交按钮",而不是直接向处理程序资源发出GET或POST请求.查看表单所在的HTML,并查看其提交到哪个URL的参数,以及它是GET还是POST方法.您可以使用urllib(2)轻松地形成这些请求.