如何从xpath获取绝对网址？

Question

如何从xpath获取绝对网址？

我正在使用以下代码来获取项目的网址：

node.xpath('//td/a[starts-with(text(),"itunes")]')[0].attrib['href']

Run Code Online (Sandbox Code Playgroud)

它给了我类似的东西：

itunes20170107.tbz

Run Code Online (Sandbox Code Playgroud)

但是，我希望获得完整的网址，即：

https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/current/itunes20170109.tbz

Run Code Online (Sandbox Code Playgroud)

有没有一种简单的方法可以从 lxml 获取完整的 url，而无需自己构建它？

Answer 1

ale*_*cxe 7

lxml.html将简单地解析hrefHTML 中的内容。如果你想让链接绝对而不是相对，你应该使用urljoin()：

from urllib.parse import urljoin  # Python3
# from urlparse import urljoin  # Python2 

url = "https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/current"

relative_url = node.xpath('//td/a[starts-with(text(),"itunes")]')[0].attrib['href']
absolute_url = urljoin(url, relative_url)

Run Code Online (Sandbox Code Playgroud)

演示：

>>> from urllib.parse import urljoin  # Python3
>>> 
>>> url = "https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/current"
>>> 
>>> relative_url = "itunes20170107.tbz"
>>> absolute_url = urljoin(url, relative_url)
>>> absolute_url
'https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/itunes20170107.tbz'

Run Code Online (Sandbox Code Playgroud)

Answer 2

El *_*uso 5

另一种方法：

import requests
from lxml import fromstring

url = 'http://server.com'
response = reqests.get(url)
etree = fromstring(response.text)
etree.make_links_absolute(url)`

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，10 月前
查看次数：	1801 次
最近记录：	8 年，5 月前