如何使用python取消缩短URL？

Question

如何使用python取消缩短URL？

bra*_*mat 6 python youtube curl urllib hyperlink

我对已解决的答案(即使用unshort.me API)的问题在于我专注于改进youtube链接.由于unshort.me很容易使用,因此使用验证码返回了近90%的结果,我无法解决.

到目前为止,我一直坚持使用:

def unshorten_url(url):
    resolvedURL = urllib2.urlopen(url)  
    print resolvedURL.url

    #t = Test()
    #c = pycurl.Curl()
    #c.setopt(c.URL, 'http://api.unshort.me/?r=%s&t=xml' % (url))
    #c.setopt(c.WRITEFUNCTION, t.body_callback)
    #c.perform()
    #c.close()
    #dom = xml.dom.minidom.parseString(t.contents)
    #resolvedURL = dom.getElementsByTagName("resolvedURL")[0].firstChild.nodeValue
    return resolvedURL.url

Run Code Online (Sandbox Code Playgroud)

注意:注释中的所有内容都是我在使用返回captcha链接的unshort.me服务时尝试做的.

有没有人知道一种更有效的方法来完成这个操作而不使用open(因为它浪费了带宽)？

Answer 1

Ped*_*iro 15

在该问题中使用评分最高的答案(不是接受的答案):

# This is for Py2k.  For Py3k, use http.client and urllib.parse instead, and
# use // instead of / for the division
import httplib
import urlparse

def unshorten_url(url):
    parsed = urlparse.urlparse(url)
    h = httplib.HTTPConnection(parsed.netloc)
    resource = parsed.path
    if parsed.query != "":
        resource += "?" + parsed.query
    h.request('HEAD', resource )
    response = h.getresponse()
    if response.status/100 == 3 and response.getheader('Location'):
        return unshorten_url(response.getheader('Location')) # changed to process chains of short urls
    else:
        return url

Run Code Online (Sandbox Code Playgroud)

作为后续行动,我只记得为什么这种方式对我不起作用.我正在开发一个Twitter应用程序,并且有些情况下网址被缩短两次(这种情况发生了很多次).例如,它将获取此视频[u't.co/LszdhNP']并返回此url etsy.me/r6JBGq - 我实际上需要链接到的最终YouTube地址.你知道怎么解决这个问题吗？ (2认同)
在我的回答中做了一个简单的改变 (2认同)
一些网站（即 twitter）会尝试强制从 http 重定向到 https。在这种情况下，您的解决方案将永远循环，因为所有连接都被假定为 http 并且将继续看到重定向标头。要验证这一点，请尝试运行 unshorten_url("[http://t.co/t](http://t.co/t)")。我建议检查 parsed.scheme 并可选择使用 httplib.HTTPSConnection()。 (2认同)

Answer 2

ber*_*sam 13

一行功能,使用请求库,是的,它支持递归.

def unshorten_url(url):
    return requests.head(url, allow_redirects=True).url

Run Code Online (Sandbox Code Playgroud)

归档时间：	14 年，4 月前
查看次数：	5413 次
最近记录：	10 年，6 月前