我的意思是,如果我去"www.yahoo.com/thispage",雅虎已经设置了一个过滤器来重定向/ thispage到/ thatpage.因此,每当有人访问/ thispage时,他/她将登陆/该页面.
如果我使用httplib/requests/urllib,它会知道有重定向吗?什么错误页面?无论何时找不到页面,某些站点都会将用户重定向到/ errorpage.
试图让登录脚本工作,我不断返回相同的登录页面,所以我打开了http流的调试(由于https,不能使用wireshark等).
我什么都没有,所以我复制了这个例子,它有效.对google.com的任何查询都有效,但是我的目标页面没有显示调试,有什么区别?如果是重定向,我希望看到第一个获取/重定向标头,http:// google重定向也是如此.
import urllib
import urllib2
import pdb
h=urllib2.HTTPHandler(debuglevel=1)
opener = urllib2.build_opener(h)
urllib2.install_opener(opener)
print '================================'
data = urllib2.urlopen('http://google.com').read()
print '================================'
data = urllib2.urlopen('https://google.com').read()
print '================================'
data = urllib2.urlopen('https://members.poolplayers.com/default.aspx').read()
print '================================'
data = urllib2.urlopen('https://google.com').read()
Run Code Online (Sandbox Code Playgroud)
当我跑步时,我得到了这个.
$ python ex.py
================================
send: 'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: google.com\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'
reply: 'HTTP/1.1 301 Moved Permanently\r\n'
header: Location: http://www.google.com/
header: Content-Type: text/html; charset=UTF-8
header: Date: Sat, 02 Jul 2011 16:20:11 GMT
header: Expires: Mon, 01 Aug 2011 16:20:11 GMT
header: Cache-Control: public, …Run Code Online (Sandbox Code Playgroud) 我试图从下面的URL中抓取数据.但是,有时候有时会driver.get(url)出现错误.在极少数情况下,它工作正常,在我的Mac上使用真正的浏览器,同一个蜘蛛每次都可以正常工作.所以这与我无关.[Errno 104] Connection reset by peer[Errno 111] Connection refusedspider
尝试了很多解决方案,比如在页面上等待选择器,隐式等待,使用selenium-requests和传递正确的请求标头等等.但似乎没有任何工作.
http://www.snapdeal.com/offers/deal-of-the-day
https://paytm.com/shop/g/paytm-home/exclusive-discount-deals
Run Code Online (Sandbox Code Playgroud)
我正在使用python,selenium并headless Firefox webdriver实现这一目标.操作系统是centos 6.5.
注意:我有很多AJAX重页被成功抓取,有些是在下面.
http://www.infibeam.com/deal-of-the-day.html, http://www.amazon.in/gp/goldbox/ref=nav_topnav_deals
Run Code Online (Sandbox Code Playgroud)
已经花了很多天试图调试问题没有运气.任何帮助,将不胜感激.
是否有可能获得文件名
e.g. xyz.com/blafoo/showall.html
Run Code Online (Sandbox Code Playgroud)
如果你使用urllib或httplib?
这样我可以将文件保存在服务器上的文件名下?
如果你去像这样的网站
xyz.com/blafoo/
Run Code Online (Sandbox Code Playgroud)
你看不到文件名.
谢谢
我尝试使用以下httplib.request函数发布unicode数据:
s = u"?????"
data = """
<spellrequest textalreadyclipped="0" ignoredups="1" ignoredigits="1" ignoreallcaps="0">
<text>%s</text>
</spellrequest>
""" % s
con = httplib.HTTPSConnection("www.google.com")
con.request("POST", "/tbproxy/spell?lang=he", data)
response = con.getresponse().read()
Run Code Online (Sandbox Code Playgroud)
但是这是我的错误:
Traceback (most recent call last):
File "C:\Scripts\iQuality\test.py", line 47, in <module>
print spellFix(u"?á???¿??????")
File "C:\Scripts\iQuality\test.py", line 26, in spellFix
con.request("POST", "/tbproxy/spell?lang=%s" % lang, data)
File "C:\Python27\lib\httplib.py", line 955, in request
self._send_request(method, url, body, headers)
File "C:\Python27\lib\httplib.py", line 989, in _send_request
self.endheaders(body)
File "C:\Python27\lib\httplib.py", line 951, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line …Run Code Online (Sandbox Code Playgroud) 我在使用一串使用SocketServer.ThreadingMixin的SimpleXMLRPCServers时间歇性地收到httplib.CannotSendRequest异常.
我所说的'链'是指如下:
我有一个客户端脚本,它使用xmlrpclib来调用SimpleXMLRPCServer上的函数.反过来,该服务器调用另一个SimpleXMLRPCServer.我意识到这听起来有多复杂,但是有充分的理由选择了这种架构,我没有看到它不应该成为可能的原因.
(testclient)client_script ---calls-->
(middleserver)SimpleXMLRPCServer ---calls--->
(finalserver)SimpleXMLRPCServer --- does something
Run Code Online (Sandbox Code Playgroud)
我已经能够在下面的简单测试代码中重现该问题.有三个片段:
finalserver:
import SocketServer
import time
from SimpleXMLRPCServer import SimpleXMLRPCServer
from SimpleXMLRPCServer import SimpleXMLRPCRequestHandler
class AsyncXMLRPCServer(SocketServer.ThreadingMixIn,SimpleXMLRPCServer): pass
# Create server
server = AsyncXMLRPCServer(('', 9999), SimpleXMLRPCRequestHandler)
server.register_introspection_functions()
def waste_time():
time.sleep(10)
return True
server.register_function(waste_time, 'waste_time')
server.serve_forever()
Run Code Online (Sandbox Code Playgroud)
middleserver:
import SocketServer
from SimpleXMLRPCServer import SimpleXMLRPCServer
from SimpleXMLRPCServer import SimpleXMLRPCRequestHandler
import xmlrpclib
class AsyncXMLRPCServer(SocketServer.ThreadingMixIn,SimpleXMLRPCServer): pass
# Create server
server = AsyncXMLRPCServer(('', 8888), SimpleXMLRPCRequestHandler)
server.register_introspection_functions()
s = xmlrpclib.ServerProxy('http://localhost:9999')
def call_waste():
s.waste_time()
return True …Run Code Online (Sandbox Code Playgroud) 我正在尝试使用httplib将信用卡信息发送到authorize.net.当我尝试发布请求时,我得到以下回溯:
File "./lib/cgi_app.py", line 139, in run res = method()
File "/var/www/html/index.py", line 113, in ProcessRegistration conn.request("POST", "/gateway/transact.dll", mystring, headers)
File "/usr/local/lib/python2.7/httplib.py", line 946, in request self._send_request(method, url, body, headers)
File "/usr/local/lib/python2.7/httplib.py", line 987, in _send_request self.endheaders(body)
File "/usr/local/lib/python2.7/httplib.py", line 940, in endheaders self._send_output(message_body)
File "/usr/local/lib/python2.7/httplib.py", line 803, in _send_output self.send(msg)
File "/usr/local/lib/python2.7/httplib.py", line 755, in send self.connect()
File "/usr/local/lib/python2.7/httplib.py", line 1152, in connect self.timeout, self.source_address)
File "/usr/local/lib/python2.7/socket.py", line 567, in create_connection raise error, msg
gaierror: [Errno -2] Name or …Run Code Online (Sandbox Code Playgroud) 这可能是一个非常愚蠢的问题,但我一直盯着这几个小时,却找不到我做错了什么.
我正在尝试使用Python通过Facebook API进行身份验证,但是在请求用户访问令牌时遇到问题.收到代码后,我向https://graph.facebook.com/oauth/access_token发出请求,如下:
conn = httplib.HTTPSConnection("graph.facebook.com")
params = urllib.urlencode({'redirect_uri':request.build_absolute_uri(reverse('some_app.views.home')),
'client_id':apis.Facebook.app_id,
'client_secret':apis.Facebook.app_secret,
'code':code})
conn.request("GET", "/oauth/access_token", params)
response = conn.getresponse()
response_body = response.read()
Run Code Online (Sandbox Code Playgroud)
作为回应,我收到了
{"error":{"message":"缺少redirect_uri参数.","type":"OAuthException","code":191}}
什么想法可能会出错?我已经验证了正在传递的redirect_uri是在应用程序域上,但这可能是一个问题,这是在本地托管,并且该域只是由我的hosts文件重定向到localhost?
谢谢你的帮助!
编辑:
我使用请求库得到了这个:
params = {'redirect_uri':request.build_absolute_uri(reverse('profiles.views.fb_signup')),
'client_id':apis.Facebook.app_id,
'client_secret':apis.Facebook.app_secret,
'code':code}
r = requests.get("https://graph.facebook.com/oauth/access_token",params=params)
Run Code Online (Sandbox Code Playgroud)
但是,我仍然希望依赖于库,这应该在没有太多困难的情况下原生支持.也许这要求太多了......
它是一个Web挖掘脚本.
def printer(q,missing):
while 1:
tmpurl=q.get()
try:
image=urllib2.urlopen(tmpurl).read()
except httplib.HTTPException:
missing.put(tmpurl)
continue
wf=open(tmpurl[-35:]+".jpg","wb")
wf.write(image)
wf.close()
Run Code Online (Sandbox Code Playgroud)
q是一个Queue()由Urls组成的``缺少一个空队列来收集错误提升网址
它由10个线程并行运行.
每次我跑这个,我得到了这个.
File "C:\Python27\lib\socket.py", line 351, in read
data = self._sock.recv(rbufsize)
File "C:\Python27\lib\httplib.py", line 541, in read
return self._read_chunked(amt)
File "C:\Python27\lib\httplib.py", line 592, in _read_chunked
value.append(self._safe_read(amt))
File "C:\Python27\lib\httplib.py", line 649, in _safe_read
raise IncompleteRead(''.join(s), amt)
IncompleteRead: IncompleteRead(5274 bytes read, 2918 more expected)
Run Code Online (Sandbox Code Playgroud)
但我确实使用了except......我尝试过其他类似的东西
httplib.IncompleteRead
urllib2.URLError
Run Code Online (Sandbox Code Playgroud)
甚至,
image=urllib2.urlopen(tmpurl,timeout=999999).read()
Run Code Online (Sandbox Code Playgroud)
但这都不起作用..
我怎么能抓住IncompleteRead和URLError?
我正在尝试将分块编码的数据发布到 httpbin.org/post。我尝试了两个选项:Requests 和 httplib
#!/usr/bin/env python
import requests
def gen():
l = range(130)
for i in l:
yield '%d' % i
if __name__ == "__main__":
url = 'http://httpbin.org/post'
headers = {
'Transfer-encoding':'chunked',
'Cache-Control': 'no-cache',
'Connection': 'Keep-Alive',
#'User-Agent': 'ExpressionEncoder'
}
r = requests.post(url, headers = headers, data = gen())
print r
Run Code Online (Sandbox Code Playgroud)
#!/usr/bin/env python
import httplib
import os.path
if __name__ == "__main__":
conn = httplib.HTTPConnection('httpbin.org')
conn.connect()
conn.putrequest('POST', '/post')
conn.putheader('Transfer-Encoding', 'chunked')
conn.putheader('Connection', 'Keep-Alive')
conn.putheader('Cache-Control', 'no-cache')
conn.endheaders()
for i in range(130): …Run Code Online (Sandbox Code Playgroud)