标签: urllib2

我只是想下载这个URL ...但它给了我一个错误!... unicode ..(Python)

theurl = 'http://bit.ly/6IcCtf/'
urlReq = urllib2.Request(theurl)
urlReq.add_header('User-Agent',random.choice(agents))
urlResponse = urllib2.urlopen(urlReq)
htmlSource = urlResponse.read()
if unicode == 1:
    #print urlResponse.headers['content-type']
    #encoding=urlResponse.headers['content-type'].split('charset=')[-1]
    #htmlSource = unicode(htmlSource, encoding)
    htmlSource =  htmlSource.encode('utf8')
return htmlSource

Run Code Online (Sandbox Code Playgroud)

请看一下unicode部分.我尝试了这两个选项......但是没有用.

htmlSource =  htmlSource.encode('utf8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 370747: ordinal not in range(128)

Run Code Online (Sandbox Code Playgroud)

当我尝试更长的编码方法时...

_mysql_exceptions.Warning: Incorrect string value: '\xE7\xB9\x81\xE9\xAB\x94...' for column 'html' at row 1

Run Code Online (Sandbox Code Playgroud)

python unicode encode http urllib2

TIM*_*MEX

lucky-day

1
推荐指数

1
解决办法

384
查看次数

如何使用urlib2访问错误时的响应标头？

我正在使用Harvest API（http://www.getharvest.com/api）。当客户超过其配额时，将返回503响应。在该响应中，应该有一个名为“ Retry-After”的标头，告诉我在重试之前要等待多长时间。

呼叫失败时如何访问响应头？我正在抓取HTTPError异常，但无法弄清楚如何从中获取标头。

我可以使用exception.read（）获得响应主体，但这只是没有标题的主体。

一些相关的代码：

try:
    request = urllib2.Request( url=self.uri+url, headers=self.headers )
    r = urllib2.urlopen(request)
    xml = r.read()
    return parseString( xml )
except urllib2.HTTPError as err:
    logger.debug("EXCEPTION: %s" % err.read() )

Run Code Online (Sandbox Code Playgroud)

python urllib2

Mar*_*hes

lucky-day

1
推荐指数

1
解决办法

1628
查看次数

python只使用urllib2获取头文件

我必须使用urllib2实现一个函数来获取头文件(不进行GET或POST).这是我的功能:

def getheadersonly(url, redirections = True):
    if not redirections:
        class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
            def http_error_302(self, req, fp, code, msg, headers):
                return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
            http_error_301 = http_error_303 = http_error_307 = http_error_302
        cookieprocessor = urllib2.HTTPCookieProcessor()
        opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
        urllib2.install_opener(opener)

    class HeadRequest(urllib2.Request):
        def get_method(self):
            return "HEAD"

    info = {}
    info['headers'] = dict(urllib2.urlopen(HeadRequest(url)).info()) 
    info['finalurl'] = urllib2.urlopen(HeadRequest(url)).geturl() 
    return info

Run Code Online (Sandbox Code Playgroud)

使用代码回答这个和这个.但是,即使标志是,也会进行重定向False.我试过代码:

print getheadersonly("http://ms.com", redirections = False)['finalurl']
print getheadersonly("http://ms.com")['finalurl']

Run Code Online (Sandbox Code Playgroud)

它在两种情况下给予morganstanley.com.这有什么不对？

python urllib2

jer*_*use

2017 05-23

1
推荐指数

1
解决办法

5581
查看次数

urllib2与https网站失败

使用urllib2并尝试获取https页面,它始终失败

Invalid url, unable to resolve

Run Code Online (Sandbox Code Playgroud)

网址是 https://www.domainsbyproxy.com/default.aspx, 但我在多个https网站上发生了这种情况.

我使用的是python 2.7,下面是我用来设置连接的代码

opener = urllib2.OpenerDirector()
opener.add_handler(urllib2.HTTPHandler())
opener.add_handler(urllib2.HTTPDefaultErrorHandler())
opener.addheaders = [('Accept-encoding', 'gzip')]
fetch_timeout = 12
response = opener.open(url, None, fetch_timeout)

Run Code Online (Sandbox Code Playgroud)

我手动设置处理程序的原因是因为我不想处理重定向(工作正常).以上工作适用于http请求,但https - 失败.

有线索吗？

python urllib2

Wiz*_*ard

lucky-day

1
推荐指数

1
解决办法

5569
查看次数

没有这样的文件或目录,奇怪或什么？

所以,

我一直在编写一个Downloader,每次运行它时都会说:

Traceback (most recent call last):
  File "C:\Python27\Downloader.py", line 7, in <module>
    f = open('c:\\users\%USERNAME%\AppData\Roaming\.minecraft\mods\CreeperCraft.zip', 'wb+')
IOError: [Errno 2] No such file or directory: 'c:\\users\\%USERNAME%\\AppData\\Roaming\\.minecraft\\mods\\CreeperCraft.zip'

Run Code Online (Sandbox Code Playgroud)

我现在,您可能会说,创建一个文件,但我希望脚本创建该文件.

那么,有人能告诉我要修复什么吗？这是代码:

import urllib2
import os
import shutil
url = "https://dl.dropbox.com/u/29251693/CreeperCraft.zip"
file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open('c:\\users\%USERNAME%\AppData\Roaming\.minecraft\mods\CreeperCraft.zip', 'wb+')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)
file_size_dl = 0
block_sz = 8192
while True:
    buffer = u.read(block_sz)
    if not buffer:
        break
    file_size_dl += len(buffer)
    f.write(buffer) …

Run Code Online (Sandbox Code Playgroud)

python download urllib2

Pyt*_*tor

2012 08-08

1
推荐指数

2
解决办法

1675
查看次数

Urllib2 HTTPS截断响应

我正在尝试使用urllib2.urlopen(实际上,我正在使用mechanize,但这是mechanize调用的方法)获取页面当我获取页面时,我得到的回复不完整; 页面被截断.但是,如果我访问页面的非HTTPS版本,我将获得完整的页面.

我在Arch Linux(3.5.4-1-ARCH x86_64)上.我运行openssl 1.0.1c.在我拥有的另一台Arch Linux机器上会出现此问题,但在使用Python 3(3.3.0)时则不会.

这个问题似乎与urllib2没有检索整个HTTP响应有关.

我在唯一允许我使用urllib2(Py I/O)的在线Python解释器上测试它,它按预期工作.这是代码:

import urllib2

u = urllib2.urlopen('https://wa151.avayalive.com/WAAdminPanel/login.aspx?ReturnUrl=%2fWAAdminPanel%2fprivate%2fHome.aspx')

print u.read()[-100:]

Run Code Online (Sandbox Code Playgroud)

最后一行应包含通常的内容</body></html>.

当我试穿urllib.urlretrieve我的机器时,我得到:

ContentTooShortError: retrieval incomplete: got only 11365 out of 13805 bytes

Run Code Online (Sandbox Code Playgroud)

我无法测试urlretrieve在线解释器,因为它不会让用户写入临时文件.晚上,我将尝试从我的机器上获取URL,但是从不同的位置.

python https urllib2 urlopen

sle*_*anc

2017 05-23

1
推荐指数

1
解决办法

4277
查看次数

创建.csv文件时出现UnicodeEncodeError

我正在尝试创建一个.csv文件,其中包含我已存储到Twitter搜索API列表中的数据.我用我选择的关键字(在这种情况下为'reddit')保存了最后100条推文,我试图将每条推文保存到.csv文件中的单元格中.我的代码在下面,我返回的错误是:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 0: ordinal not in range(128)

如果有人知道我能做些什么来解决这个问题,我将不胜感激!

import sys
import os


import urllib
import urllib2
import json
from pprint import pprint
import csv

import sentiment_analyzer

import codecs

class Twitter:
    def __init__(self):
        self.api_url = {}
        self.api_url['search'] = 'http://search.twitter.com/search.json?'

    def search(self, params):

        url = self.make_url(params, apitype='search')
        data = json.loads(urllib2.urlopen(url).read().decode('utf-8').encode('ascii',     'ignore'))

        txt = []
        for obj in data['results']:
            txt.append(obj['text'])

        return '\n'.join(txt)

    def make_url(self, params, apitype='search'):


        baseurl = self.api_url[apitype] 
        return baseurl + urllib.urlencode(params)


if __name__ …

Run Code Online (Sandbox Code Playgroud)

python csv urllib2

Nee*_*imo

2017 03-16

1
推荐指数

1
解决办法

1476
查看次数

urllib2 geturl()不适用于某些网址重定向

我正在学习python并试图让urllib2 geturl()工作.到目前为止,我有以下骨架,看起来像:

import urllib2
gh=urllib2.urlopen(http://somewebsite.com/).geturl()
print gh

Run Code Online (Sandbox Code Playgroud)

这似乎工作正常.但是,当我尝试使用此处给出的URL时,它无法获得"最终URL"(但在浏览器上工作).

我很感激任何解决这个问题的指导.

python urllib2 python-2.7

AJW*_*AJW

lucky-day

1
推荐指数

1
解决办法

3985
查看次数

自动执行浏览器操作 - 单击提交按钮错误 - "单击成功但加载失败..."

我正在尝试编写一个自动登录到两个网站并转到某个页面的代码.我用Splinter.

我只使用PhantomJS作为浏览器类型在"Mijn ING Zakelijk"网站上收到错误.

直到几天前,代码在20次中完美地运行了20次.但是从今天起我就收到了错误.有时代码运行正常.其他时候它没有,并给我"点击成功,但加载失败.."错误.这是完整的追溯:

## Attempting to login to Mijn ING Zakelijk, please wait.
- Starting the browser..
- Visiting the url..
- Filling the username form with the defined username..
- Filling the password form with the defined password..
- Clicking the submit button..
Traceback (most recent call last):
  File "/Users/###/Dropbox/Python/Test environment 2.7.3/Splinter.py", line 98, in <module>
    mijning()
  File "/Users/###/Dropbox/Python/Test environment 2.7.3/Splinter.py", line 27, in mijning
    attemptLogin(url2, username2, password2, defined_title2, website_name2, browser_type2)
  File "/Users/###/Dropbox/Python/Test …

Run Code Online (Sandbox Code Playgroud)

python selenium webdriver urllib2 phantomjs

nar*_*ero

2013 11-08

1
推荐指数

1
解决办法

3939
查看次数

如何从重定向的URL下载文件？

我需要使用url下载文件-> https://readthedocs.org/projects/django/downloads/pdf/latest/

该URL重定向到带有.pdf文件的URL。

如何使用python使用该网址下载该文件？

我试过了：-

import urllib
def download_file(download_url):
    web_file = urllib.urlopen(download_url)
    local_file = open('some_file.pdf', 'w')
    local_file.write(web_file.read())
    web_file.close()
    local_file.close()

if __name__ == 'main':
    download_file('https://readthedocs.org/projects/django/downloads/pdf/latest/')

Run Code Online (Sandbox Code Playgroud)

但这不起作用

python urllib2 python-3.x python-requests

Nit*_*shu

2018 03-20

1
推荐指数

1
解决办法

4623
查看次数