如何在Python中给出两个绝对URL来构造相对url

Question

如何在Python中给出两个绝对URL来构造相对url

是否有内置函数来获取这样的url:../images.html给定一个像这样的基本URL:http://www.example.com/faq/index.html和一个目标URL,如http://www.example.com/images.html

我检查了urlparse模块.我想要的是urljoin()函数的对应物.

Answer 1

unu*_*tbu 10

您可以使用urlparse.urlparse查找路径,并使用os.path.relname的posixpath版本来查找相对路径.

(警告:这适用于Linux,但可能不适用于Windows):

import urlparse
import sys
import posixpath

def relurl(target,base):
    base=urlparse.urlparse(base)
    target=urlparse.urlparse(target)
    if base.netloc != target.netloc:
        raise ValueError('target and base netlocs do not match')
    base_dir='.'+posixpath.dirname(base.path)
    target='.'+target.path
    return posixpath.relpath(target,start=base_dir)

tests=[
    ('http://www.example.com/images.html','http://www.example.com/faq/index.html','../images.html'),
    ('http://google.com','http://google.com','.'),
    ('http://google.com','http://google.com/','.'),
    ('http://google.com/','http://google.com','.'),
    ('http://google.com/','http://google.com/','.'), 
    ('http://google.com/index.html','http://google.com/','index.html'),
    ('http://google.com/index.html','http://google.com/index.html','index.html'), 
    ]

for target,base,answer in tests:
    try:
        result=relurl(target,base)
    except ValueError as err:
        print('{t!r},{b!r} --> {e}'.format(t=target,b=base,e=err))
    else:
        if result==answer:
            print('{t!r},{b!r} --> PASS'.format(t=target,b=base))
        else:
            print('{t!r},{b!r} --> {r!r} != {a!r}'.format(
                t=target,b=base,r=result,a=answer))

Run Code Online (Sandbox Code Playgroud)

@ yasar11732:使用posixpath.relpath() (3认同)
所有测试都使用python 2.7传递给win7 (2认同)

Answer 2

red*_*dow 5

首先想到的解决方案是:

>>> os.path.relpath('/images.html', os.path.dirname('/faq/index.html'))
'../images.html'

Run Code Online (Sandbox Code Playgroud)

当然,这需要URL解析 - >域名比较(!!) - >路径重写,如果是这种情况 - >重新添加查询和片段.

编辑:更完整的版本

import urlparse
import posixpath

def relative_url(destination, source):
    u_dest = urlparse.urlsplit(destination)
    u_src = urlparse.urlsplit(source)

    _uc1 = urlparse.urlunsplit(u_dest[:2]+tuple('' for i in range(3)))
    _uc2 = urlparse.urlunsplit(u_src[:2]+tuple('' for i in range(3)))

    if _uc1 != _uc2:
        ## This is a different domain
        return destination

    _relpath = posixpath.relpath(u_dest.path, posixpath.dirname(u_src.path))

    return urlparse.urlunsplit(('', '', _relpath, u_dest.query, u_dest.fragment)

Run Code Online (Sandbox Code Playgroud)

然后

>>> relative_url('http://www.example.com/images.html', 'http://www.example.com/faq/index.html')
'../images.html'
>>> relative_url('http://www.example.com/images.html?my=query&string=here#fragment', 'http://www.example.com/faq/index.html')
'../images.html?my=query&string=here#fragment'
>>> relative_url('http://www.example.com/images.html', 'http://www2.example.com/faq/index.html')
'http://www.example.com/images.html'
>>> relative_url('https://www.example.com/images.html', 'http://www.example.com/faq/index.html')
'https://www.example.com/images.html'

Run Code Online (Sandbox Code Playgroud)

编辑:现在使用posixpath实现os.path使其在Windows下工作.

归档时间：	14 年，2 月前
查看次数：	2759 次
最近记录：	10 年，7 月前