mav*_*ick 0 python beautifulsoup web-scraping python-requests
在这里考虑这个网站:https : //dlnr.hawaii.gov/dsp/parks/oahu/ahupuaa-o-kahana-state-park/
我正在寻找右侧标题下的内容。这是我的示例代码,它应该返回内容列表但返回空字符串:
import requests as req
from bs4 import BeautifulSoup as bs
r = req.get('https://dlnr.hawaii.gov/dsp/parks/oahu/ahupuaa-o-kahana-state-park/').text
soup = bs(r)
par = soup.find('h3', text= 'Facilities')
for sib in par.next_siblings:
print(sib)
Run Code Online (Sandbox Code Playgroud)
这将返回:
<ul class="park_icon">
<div class="clearfix"></div>
</ul>
Run Code Online (Sandbox Code Playgroud)
该网站不显示该类的任何 div 元素。此外,未捕获列表项。
该框架中的设施和其他信息由 动态加载JavaScript,因此bs4在源中看不到它们,HTML因为它们根本不存在。
但是,您可以查询端点并获取所需的所有信息。
就是这样:
import json
import re
import time
import requests
headers = {
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/90.0.4430.93 Safari/537.36",
"referer": "https://dlnr.hawaii.gov/",
}
endpoint = f"https://stateparksadmin.ehawaii.gov/camping/park-site.json?parkId=57853&_={int(time.time())}"
response = requests.get(endpoint, headers=headers).text
data = json.loads(re.search(r"callback\((.*)\);", response).group(1))
print("\n".join(f for f in data["park info"]["facilities"]))
Run Code Online (Sandbox Code Playgroud)
输出:
Boat Ramp
Campsites
Picnic table
Restroom
Showers
Trash Cans
Water Fountain
Run Code Online (Sandbox Code Playgroud)
这是整个JSON:
{
"park info": {
"name": "Ahupua\u02bba \u02bbO Kahana State Park",
"id": 57853,
"island": "Oahu",
"activities": [
"Beachgoing",
"Camping",
"Dogs on Leash",
"Fishing",
"Hiking",
"Hunting",
"Sightseeing"
],
"facilities": [
"Boat Ramp",
"Campsites",
"Picnic table",
"Restroom",
"Showers",
"Trash Cans",
"Water Fountain"
],
"prohibited": [
"No Motorized Vehicles/ATV's",
"No Alcoholic Beverages",
"No Open Fires",
"No Smoking",
"No Commercial Activities"
],
"hazards": [],
"photos": [],
"location": {
"latitude": 21.556086,
"longitude": -157.875579
},
"hiking": [
{
"name": "Nakoa Trail",
"id": 17,
"activities": [
"Dogs on Leash",
"Hiking",
"Hunting",
"Sightseeing"
],
"facilities": [
"No Drinking Water"
],
"prohibited": [
"No Bicycles",
"No Open Fires",
"No Littering/Dumping",
"No Camping",
"No Smoking"
],
"hazards": [
"Flash Flood"
],
"photos": [],
"location": {
"latitude": 21.551087,
"longitude": -157.881228
},
"has_google_street": false
},
{
"name": "Kapa\u2018ele\u2018ele Trail",
"id": 18,
"activities": [
"Dogs on Leash",
"Hiking",
"Sightseeing"
],
"facilities": [
"No Drinking Water",
"Restroom",
"Trash Cans"
],
"prohibited": [
"No Bicycles",
"No Open Fires",
"No Littering/Dumping",
"No Camping",
"No Smoking"
],
"hazards": [],
"photos": [],
"location": {
"latitude": 21.554744,
"longitude": -157.876601
},
"has_google_street": false
}
]
}
}
Run Code Online (Sandbox Code Playgroud)