好吧,我正在制作一个多阶段的程序......我无法完成第一阶段的工作.我想要做的是登录Twitter.com,然后阅读用户页面上的所有直接消息.
最终我将阅读所有寻找某些事情的直接消息,但这应该不难.
到目前为止这是我的代码
import urllib
import urllib2
import httplib
import sys
userName = "notmyusername"
password = "notmypassword"
URL = "http://twitter.com/#inbox"
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, "http://twitter.com/", userName, password)
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
pageshit = urllib2.urlopen(URL, "80").readlines()
print pageshit
Run Code Online (Sandbox Code Playgroud)
因此,对我所做的错误的一点见解和帮助将会非常有帮助.
我正在尝试使用urllib2和python-ntlm连接到NT身份验证的服务器,但我收到一个错误.这是我正在使用的代码,来自python-ntlm站点:
user = 'DOMAIN\user.name'
password = 'Password123'
url = 'http://corporate.domain.com/page.aspx?id=foobar'
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
# create the NTLM authentication handler
auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman)
# create and install the opener
opener = urllib2.build_opener(auth_NTLM)
urllib2.install_opener(opener)
# retrieve the result
response = urllib2.urlopen(url)
return response.read()
Run Code Online (Sandbox Code Playgroud)
这是我得到的错误:
Traceback (most recent call last):
File "C:\Python27\test.py", line 112, in get_ntlm_data
response = urllib2.urlopen(url)
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 398, …Run Code Online (Sandbox Code Playgroud) 我的代码有问题.
#!/usr/bin/env python3.1
import urllib.request;
# Disguise as a Mozila browser on a Windows OS
userAgent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)';
URL = "www.example.com/img";
req = urllib.request.Request(URL, headers={'User-Agent' : userAgent});
# Counter for the filename.
i = 0;
while True:
fname = str(i).zfill(3) + '.png';
req.full_url = URL + fname;
f = open(fname, 'wb');
try:
response = urllib.request.urlopen(req);
except:
break;
else:
f.write(response.read());
i+=1;
response.close();
finally:
f.close();
Run Code Online (Sandbox Code Playgroud)
当我创建urllib.request.Request对象(称为req)时,问题似乎就出现了.我用一个不存在的url创建它,但后来我将url更改为它应该是什么.我这样做是为了让我可以使用相同的urllib.request.Request对象,而不必在每次迭代时创建新的对象.可能有一种机制可以在python中做到这一点,但我不确定它是什么.
编辑 错误消息是:
>>> response = urllib.request.urlopen(req);
Traceback (most recent call last): …Run Code Online (Sandbox Code Playgroud) 如何同时下载多个链接?我下面的脚本有效,但一次只下载一个,速度非常慢.我无法弄清楚如何在我的脚本中加入多线程.
Python脚本:
from BeautifulSoup import BeautifulSoup
import lxml.html as html
import urlparse
import os, sys
import urllib2
import re
print ("downloading and parsing Bibles...")
root = html.parse(open('links.html'))
for link in root.findall('//a'):
url = link.get('href')
name = urlparse.urlparse(url).path.split('/')[-1]
dirname = urlparse.urlparse(url).path.split('.')[-1]
f = urllib2.urlopen(url)
s = f.read()
if (os.path.isdir(dirname) == 0):
os.mkdir(dirname)
soup = BeautifulSoup(s)
articleTag = soup.html.body.article
converted = str(articleTag)
full_path = os.path.join(dirname, name)
open(full_path, 'w').write(converted)
print(name)
Run Code Online (Sandbox Code Playgroud)
HTML文件名为links.html:
<a href="http://www.youversion.com/bible/gen.1.nmv-fas">http://www.youversion.com/bible/gen.1.nmv-fas</a>
<a href="http://www.youversion.com/bible/gen.2.nmv-fas">http://www.youversion.com/bible/gen.2.nmv-fas</a>
<a href="http://www.youversion.com/bible/gen.3.nmv-fas">http://www.youversion.com/bible/gen.3.nmv-fas</a>
<a href="http://www.youversion.com/bible/gen.4.nmv-fas">http://www.youversion.com/bible/gen.4.nmv-fas</a>
Run Code Online (Sandbox Code Playgroud) 我在使用python的urllib2获取此URL的结果页面时非常艰难:
http://www.google.com/search?tbs=sbi:AMhZZitAaz7goe6AsfVSmFw1sbwsmX0uIjeVnzKHjEXMck70H3j32Q-6FApxrhxdSyMo0OedyWkxk3-qYbyf0q1OqNspjLu8DlyNnWVbNjiKGo87QUjQHf2_1idZ1q_1vvm5gzOCMpChYiKsKYdMywOLjJzqmzYoJNOU2UsTs_1zZGWjU-LsjdFXt_1D5bDkuyRK0YbsaLVcx4eEk_1KMkcJpWlfFEfPMutxTLGf1zxD-9DFZDzNOODs0oj2j_1KG8FRCaMFnTzAfTdl7JfgaDf_1t5Vti8FnbeG9i7qt9wF6P-QK9mdvC15hZ5UR29eQdYbcD1e4woaOQCmg8Q1VLVPf4-kf8dAI7p3jM_1MkBBwaxdt_1TsM4FLwh0oHAYKOS5qBRI28Vs0aw5_1C5-WR4dC902Eqm5eAkLiQyAM9J2bioR66g3tMWe-j9Hyh1ID40R1NyXEJDHcGxp7xOn_16XxfW_1Cq5ArdSNzxFvABb1UcXCn5s4_1LpXZxhZbauwaO8cg3CKGLUvl_1wySDB7QIkMIF2ZInEPS4K-eyErVKqOdY9caYUD8X7oOf6sDKFjT7pNHwlkXiuYbKBRYjlvRHPlcPN1WHWCJWdSNyXdZhwDI3VRaKwmi4YNvkryeNMMbhGytfvlNaaelKcOzWbvzCtSNaP2lJziN1x3btcIAplPcoZxEpb0cDlQwId3A5FDhczxpVbdRnOB-Xeq_1AiUTt_1iI6bSgUAinWXQFYWveTOttdSNCgK-VTxV4OCtlrCrZerk27RBLAzT0ol9NOfYmYhiabzhUczWk4NuiVhKN-M4eo76cAsi74PY4V_1lWjvOpI35V_1YLJQrm0fxVcD34wxFYCIllT2gYW09fj3cuBDMNbsaJqPVQ04OOGlwmcmJeAnK96xd_1aMUd6FsVLOSDS7RfS5MNUSyd1jnXvRU_1MF_1Dj8oC8sm7PfVdjm3firiMcaKM28j9kGWbY0heIGLtO_1m6ad-iKfxYEzSux2b5w62LQlP57yS7vX8RFoyKzHA0RrFIEbPBQdNMA3Vpw0G_1LvEjCAPSCV1HH1pDp0l4EnNCvUIAppVXzNMyWT_1gKITj1NLqAn-Z1tH323JwZSc77OftDSreyHJ-BPxn3n7JMkNZFcQx6S7tfBxeqJ1NuDlpax11pw0_1Oi_1nF3vyEP0NbGKSVgNvBv_1tv8ahxvrHn9UnP78FleiOpzUBfdfRPZiT20VEq5-oXtV_1XwIzrd-5_15-cf2yoL7ohyPuv3WKGUGr4YCsYje7_1D8VslqMPsvbwMg9haj3TrBKH7go70ZfPjUv3h1K7lplnnCdV0hrYVQkSLUY1eEor3L--Vu5PlewS60ZH5YEn4qTnDxniV95h8q0Y3RWXJ6gIXitR5y6CofVg
Run Code Online (Sandbox Code Playgroud)
我使用以下标题,这应该是简单的我会想:
headers = {'Host':'www.google.com','User-Agent':user_agent,'Accept-Language':'en-us,en;q=0.5','Accept-Encoding':'gzip, deflate','Accept-Charset':'ISO-8859-1,utf-8;q=0.7,*;q=0.7','Connection':'keep-alive','Referer':'http://www.google.co.in/imghp?hl=en&tab=ii','Cookie':'PREF=ID=1d7bc4ff2a5d8bc6:U=1d37ba5a518b9be1:FF=4:LD=en:TM=1300950025:LM=1302071720:S=rkk0IbbhxUIgpTyA; NID=51=uNq6mZ385WlV1UTfXsiWkSgnsa6PdjH4l9ph-vSQRszBHRcKW3VRJclZLd2XUEdZtxiCtl5hpbJiS3SpEV7670w_x738h75akcO6Viw47MUlpCZfy4KZ2vLT4tcleeiW; SID=DQAAAMEAAACoYm-3B2aiLKf0cRU8spJuiNjiXEQRyxsUZqKf8UXZXS55movrnTmfEcM6FYn-gALmyMPNRIwLDBojINzkv8doX69rUQ9-'}
Run Code Online (Sandbox Code Playgroud)
当我执行以下操作时,我得到的结果不包含任何普通Web浏览器返回的内容:
request=urllib2.Request(url,,None,headers)
response=urllib2.urlopen(request)
html=response.read()
Run Code Online (Sandbox Code Playgroud)
同样,这段代码返回一堆我读不懂的十六进制垃圾:
request=urllib2.Request(url,headers=headers)
response=urllib2.urlopen(request)
html=response.read()
Run Code Online (Sandbox Code Playgroud)
请帮助,因为我很确定这很简单,我一定要错过一些东西.我能够以类似的方式获取此链接,还可以使用以下代码将图像上传到images.google.com:
import httplib, mimetypes, android, sys, urllib2, urllib, simplejson
def post_multipart(host, selector, fields, files):
"""
Post fields and files to an http host as multipart/form-data.
fields is a sequence of (name, value) elements for regular form fields.
files is a sequence of (name, filename, value) elements for data to be uploaded as files
Return the server's response page.
"""
content_type, body = encode_multipart_formdata(fields, …Run Code Online (Sandbox Code Playgroud) 我需要使用POST,GET和其他方法发出HTTP和HTTPS请求,并指定标头和超时.
在互联网上有很多例子,它们都是不同的:
import urllib.parse
import urllib.request
url = 'http://www.someserver.com/cgi-bin/register.cgi'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
headers = { 'User-Agent' : user_agent }
data = urllib.parse.urlencode(values)
req = urllib.request.Request(url, data, headers)
response = urllib.request.urlopen(req)
the_page = response.read()
Run Code Online (Sandbox Code Playgroud)
要么
fetcher = urllib2.build_opener()
fetcher.addheaders.append(('Cookie', 'aaaa=%s' % aaaa))
res = fetcher.open(settings.ABC_URL)
Run Code Online (Sandbox Code Playgroud)
要么
req = urllib2.Request(url=url)
req.add_header('X-Real-IP', request.META['REMOTE_ADDR'])
req.add_header('Cookie', request.META['HTTP_COOKIE'])
req.add_header('User-Agent', request.META['HTTP_USER_AGENT'])
resp = urllib2.urlopen(req).read()
Run Code Online (Sandbox Code Playgroud)
要么
handler = urllib.urlopen('http://...')
response …Run Code Online (Sandbox Code Playgroud) 我正在尝试从Jekins服务器获取URL.直到最近,我才能使用此页面上描述的模式(HOWTO使用urllib2获取Internet资源)来创建一个密码管理器,该管理器使用用户名和密码正确响应BasicAuth挑战.一切都很好,直到Jenkins团队改变他们的安全模型,并且该代码不再有效.
# DOES NOT WORK!
import urllib2
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
top_level_url = "http://localhost:8080"
password_mgr.add_password(None, top_level_url, 'sal', 'foobar')
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
opener = urllib2.build_opener(handler)
a_url = 'http://localhost:8080/job/foo/4/api/python'
print opener.open(a_url).read()
Run Code Online (Sandbox Code Playgroud)
堆栈跟踪:
Traceback (most recent call last):
File "/home/sal/workspace/jenkinsapi/src/examples/password.py", line 11, in <module>
print opener.open(a_url).read()
File "/usr/lib/python2.7/urllib2.py", line 410, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 448, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 382, …Run Code Online (Sandbox Code Playgroud) noob.py这里.我正在尝试从页面获取内容但该print语句引发了一个我不理解的错误.
实际代码:
import urllib2
import sys
url = "http://make.wordpress.org/core/page/2/"
response = urllib2.urlopen(url)
html = response.read
print html
Run Code Online (Sandbox Code Playgroud)
输出:
$ python get.py
<bound method _fileobject.read of <socket._fileobject object at 0x3722ec9a8d0>>
Run Code Online (Sandbox Code Playgroud)
我怀疑Python不喜欢那个特定的URL,因为它可以使用,http://www.python.org相反,但我可以获得任何有用的信息来理解它.
我没有得到任何的是,如果我附上此内try:和except:? pass,我仍然得到该错误消息.
任何指针都很受欢迎.
website = raw_input('website: ')
with open('words.txt', 'r+') as arquivo:
for lendo in arquivo.readlines():
msmwebsite = website + lendo
try:
abrindo = urllib2.urlopen(msmwebsite)
abrindo2 = abrindo.read()
except URLError as e:
pass
if abrindo.code == 200:
palavras = ['registration', 'there is no form']
for palavras2 in palavras:
if palavras2 in abrindo2:
print msmwebsite, 'up'
else:
pass
else:
pass
Run Code Online (Sandbox Code Playgroud)
它工作但由于某种原因,一些网站我收到此错误:
if abrindo.code == 200:
NameError: name 'abrindo' is not defined
Run Code Online (Sandbox Code Playgroud)
怎么解决?.................................................. .................................................. .................................................. .................................
我正在尝试创建一个扩展HTTPBasicAuthHandler的类.出于某种原因,我在旧代码中使用的相同方法在这里不起作用.
class AuthInfo(urllib2.HTTPBasicAuthHandler):
def __init__(self, realm, url, username, password):
self.pwdmgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
self.pwdmgr.add_password(None, url, username, password)
super(AuthInfo, self).__init__(self.pwdmgr)
Run Code Online (Sandbox Code Playgroud)
这是错误:
Traceback (most recent call last):
File "./RestResult.py", line 67, in ?
auth = AuthInfo(None, "default", "xxxxx", "xxxxxxxx")
File "./RestResult.py", line 47, in __init__
super(AuthInfo, self).__init__(self.pwdmgr)
TypeError: super() argument 1 must be type, not classobj
Run Code Online (Sandbox Code Playgroud)