gyt*_*hon 5 python web-scraping python-requests
我想从带有动态表的网页中抓取数据。该表包含有关乘坐火车的信息。
这是网站:https : //www.laerm-monitoring.de/zug/?mp=3/
我试图通过一个简单的挂载请求会话来请求数据,但我只得到了基本的 HTML 数据,而没有表格中的数据。
def requests_retry_session(
retries=3,
backoff_factor=0.3,
status_forcelist=(500, 502, 504, 429),
session=None,
):
session = session or requests.Session()
retry = Retry(
total=retries,
read=retries,
connect=retries,
backoff_factor=backoff_factor,
status_forcelist=status_forcelist,
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
return session
session = requests_retry_session()
response = session.get('https://www.laerm-monitoring.de/zug/?mp=3/')
response.content
Run Code Online (Sandbox Code Playgroud)
我怎样才能正确地做到这一点?
数据是从不同的 URL 动态加载的。您可以使用此示例如何仅使用requests/加载它beautifulsoup:
import json
import requests
from bs4 import BeautifulSoup
data = {
"sort": "Einfahrtzeit-desc",
"page": "1",
"pageSize": "10",
"group": "",
"filter": "",
"__RequestVerificationToken": "",
"locid": "1",
}
headers = {"X-Requested-With": "XMLHttpRequest"}
url = "https://www.laerm-monitoring.de/zug/"
api_url = "https://www.laerm-monitoring.de/zug/train_read"
with requests.Session() as s:
soup = BeautifulSoup(s.get(url).content, "html.parser")
data["__RequestVerificationToken"] = soup.select_one(
'[name="__RequestVerificationToken"]'
)["value"]
data = s.post(api_url, data=data, headers=headers).json()
# pretty print the data
print(json.dumps(data, indent=4))
Run Code Online (Sandbox Code Playgroud)
印刷:
import json
import requests
from bs4 import BeautifulSoup
data = {
"sort": "Einfahrtzeit-desc",
"page": "1",
"pageSize": "10",
"group": "",
"filter": "",
"__RequestVerificationToken": "",
"locid": "1",
}
headers = {"X-Requested-With": "XMLHttpRequest"}
url = "https://www.laerm-monitoring.de/zug/"
api_url = "https://www.laerm-monitoring.de/zug/train_read"
with requests.Session() as s:
soup = BeautifulSoup(s.get(url).content, "html.parser")
data["__RequestVerificationToken"] = soup.select_one(
'[name="__RequestVerificationToken"]'
)["value"]
data = s.post(api_url, data=data, headers=headers).json()
# pretty print the data
print(json.dumps(data, indent=4))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
176 次 |
| 最近记录: |