抓取时拒绝访问

V.A*_*Anh 2 python beautifulsoup web-scraping

我想创建一个脚本以继续访问https://www.size.co.uk/featured/footwear/并抓取内容,但不知何故,当我运行脚本时,访问被拒绝。这是代码:

from urllib import urlopen
from bs4 import BeautifulSoup as BS
url = urlopen('https://www.size.co.uk/')
print BS(url, 'lxml')
Run Code Online (Sandbox Code Playgroud)

输出是

<html><head>
<title>Access Denied</title>
</head><body>
<h1>Access Denied</h1>

You don't have permission to access "http://www.size.co.uk/" on this server.
<p>
Reference #18.6202655f.1498945327.11002828
</p></body>
</html>
Run Code Online (Sandbox Code Playgroud)

当我在其他网站上尝试时,代码运行良好,而且当我使用 Selenium 时,没有任何反应,但我仍然想知道如何在不使用 Selenium 的情况下绕过此错误。但是当我在http://www.footpatrol.co.uk/shop等不同网站上使用 Selenium 时,我遇到了相同的访问被拒绝错误,这是footpatrol 的代码:

from selenium import webdriver

driver = webdriver.PhantomJS('C:\Users\V\Desktop\PY\web_scrape\phantomjs.exe')
driver.get('http://www.footpatrol.com')
pageSource = driver.page_source
soup = BS(pageSource, 'lxml')
print soup
Run Code Online (Sandbox Code Playgroud)

输出是:

<html><head>
<title>Access Denied</title>
</head><body>
<h1>Access Denied</h1>

You don't have permission to access "http://www.footpatrol.co.uk/" on this 
server.<p>
Reference #18.6202655f.1498945644.110590db


</p></body></html>
Run Code Online (Sandbox Code Playgroud)

Dmi*_*kiy 8

import requests
from bs4 import BeautifulSoup as BS

url = 'https://www.size.co.uk/'
agent = {"User-Agent":'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
page = requests.get(url, headers=agent)
print (BS(page.content, 'lxml'))
Run Code Online (Sandbox Code Playgroud)