Hoo*_*ini 5 python scrapy python-requests
我有一个重定向到外部网站的链接...我想知道该链接重定向到的最终 URL 是什么。我试过:
requests.get("link.which.redirects.and.has.dynamic.js.code.com")
Run Code Online (Sandbox Code Playgroud)
但是我无法获得最终重定向的 URL,因为它是动态构建的......我不确定到底会发生什么,但是页面加载涉及一些 JavaScript 代码,最终结果是重定向到外部页面。
所以相反,我尝试了Selenium和ChromeDriverManager。
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
class MySpider(scrapy.Spider):
name = 'my_spider'
def __init__(self):
self.driver = webdriver.Chrome(ChromeDriverManager().install())
def parse(self, response):
link = "link.which.redirects.and.has.dynamic.js.code.com"
self.driver.get(link)
time.sleep(1) # without this wait, driver.current_url is not the final redirect
url = self.driver.current_url
Run Code Online (Sandbox Code Playgroud)
上面的代码加载了整个页面,为了获取重定向URL,有没有更高效的获取重定向URL的方法?
要找出最终结果,您可以使用以下代码:
import time
import requests
website = requests.get("link.which.redirects.and.has.dynamic.js.code.com", time.sleep(5))
print(website.url) # To see final URL
print(website.history) # To see from where it was the redirection codes
print(website.is_redirect) # Was it redirected
print(website.is_permanent_redirect) # Is it permanently redirected
Run Code Online (Sandbox Code Playgroud)