我想使用名为BeautifulSoup的库来抓取网站的内容。
码:
from bs4 import BeautifulSoup
from urllib.request import urlopen
html_http_response = urlopen("http://www.airlinequality.com/airport-reviews/jeddah-airport/")
data = html_http_response.read()
soup = BeautifulSoup(data, "html.parser")
print(soup.prettify())
Run Code Online (Sandbox Code Playgroud)
输出:
<html style="height:100%">
<head>
<meta content="NOINDEX, NOFOLLOW" name="ROBOTS"/>
<meta content="telephone=no" name="format-detection"/>
<meta content="initial-scale=1.0" name="viewport"/>
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
</head>
<body style="margin:0px;height:100%">
<iframe frameborder="0" height="100%" marginheight="0px" marginwidth="0px" src="/_Incapsula_Resource?CWUDNSAI=9&xinfo=9-57435048-0%200NNN%20RT%281512733380259%202%29%20q%280%20-1%20-1%20-1%29%20r%280%20-1%29%20B12%284%2c315%2c0%29%20U19&incident_id=466002040110357581-305794245507288265&edet=12&cinfo=04000000" width="100%">
Request unsuccessful. Incapsula incident ID: 466002040110357581-305794245507288265
</iframe>
</body>
</html>
Run Code Online (Sandbox Code Playgroud)
从浏览器检查内容时,主体包含iFrame balise,而不是显示的内容。