Python:与urljoin的混淆

Question

Python:与urljoin的混淆

我试图从不同的部分形成URL,并且无法理解此方法的行为.例如:

Python 3.x

from urllib.parse import urljoin

>>> urljoin('some', 'thing')
'thing'
>>> urljoin('http://some', 'thing')
'http://some/thing'
>>> urljoin('http://some/more', 'thing')
'http://some/thing'
>>> urljoin('http://some/more/', 'thing') # just a tad / after 'more'
'http://some/more/thing'
urljoin('http://some/more/', '/thing')
'http://some/thing'

Run Code Online (Sandbox Code Playgroud)

你能解释一下这种方法的确切行为吗？

Answer 1

sbe*_*rry 73

最好的方法(对我来说)是第一个参数,base就像你在浏览器中的页面一样.第二个参数url是该页面上锚点的href.结果是您点击时将被定向到的最终网址.

>>> urljoin('some', 'thing')
'thing'

Run Code Online (Sandbox Code Playgroud)

这个是有意义的给我的描述.虽然人们希望基地包括一个计划和领域.

>>> urljoin('http://some', 'thing')
'http://some/thing'

Run Code Online (Sandbox Code Playgroud)

如果你在某个虚拟主机上,并且有一个锚点就像<a href='thing'>Foo</a>那时链接将带你去http://some/thing

>>> urljoin('http://some/more', 'thing')
'http://some/thing'

Run Code Online (Sandbox Code Playgroud)

我们在some/more这里,所以相关的链接thing将带我们去/some/thing

>>> urljoin('http://some/more/', 'thing') # just a tad / after 'more'
'http://some/more/thing'

Run Code Online (Sandbox Code Playgroud)

在这里,我们不在some/more,我们在some/more/哪个不同.现在,我们的相关链接将带我们去some/more/thing

>>> urljoin('http://some/more/', '/thing')
'http://some/thing'

Run Code Online (Sandbox Code Playgroud)

最后.如果打开some/more/并且href是/thing,则将链接到some/thing.

感谢您的解释...这种行为使寻找“真”`urljoin`，类似于`os.path.join` (7认同)
对于那些只想将一个 url 添加到另一个 url 上而不需要 urljoin 逻辑的人， posixpath.join() 可能适合您。 (2认同)

Answer 2

Bar*_*ing 6

urllib.parse.urljoin(base, url )

如果 url 是绝对 URL（即以 //、http://、https://、...开头），则 url 的主机名和/或方案将出现在结果中。例如：

>>> urljoin('https://www.google.com', '//www.microsoft.com')
'https://www.microsoft.com'
>>>

Run Code Online (Sandbox Code Playgroud)

否则，urllib.parse。urljoin (base, url) 将

通过将“基本 URL” (base) 与另一个 URL (url) 组合来构建完整（“绝对”）URL。非正式地，这使用基本 URL 的组件，特别是寻址方案、网络位置和（部分）路径，来提供相对 URL 中缺少的组件。

>>> urlparse('http://a/b/c/d/e')
ParseResult(scheme='http', netloc='a', path='/b/c/d/e', params='', query='', fragment='')
>>> urljoin('http://a/b/c/d/e', 'f')
>>>'http://a/b/c/d/f'
>>> urlparse('http://a/b/c/d/e/')
ParseResult(scheme='http', netloc='a', path='/b/c/d/e/', params='', query='', fragment='')
>>> urljoin('http://a/b/c/d/e/', 'f')
'http://a/b/c/d/e/f'
>>>

Run Code Online (Sandbox Code Playgroud)

它获取第一个参数 (base) 的路径，去掉最后一个 / 之后的部分，并与第二个参数 (url) 连接。

如果url以/开头，则将base的scheme和netloc与url连接起来

>>>urljoin('http://a/b/c/d/e', '/f')
'http://a/f'

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，9 月前
查看次数：	27771 次
最近记录：	7 年，7 月前