小编MR1*_*MR1的帖子

如果请求响应为404或505,如何跳过页面

我用python编写了一个刮板。不幸的是,当刮板遇到404505页面时,它将停止工作。如何避免循环中的那些页面以避免此问题?

这是我的代码:

import requests
from bs4 import BeautifulSoup
import time
c = int(40622)
a = 10
for a in range(10):
    url = 'https://example.com/rockery/'+str(c)
    c = int(c) + 1
    print('-------------------------------------------------------------------------------------')
    print(url)
    print(c)
    time.sleep(5)
    response = requests.get(url)
    html = response.content
    soup = BeautifulSoup(html, "html.parser")
    name = soup.find('a', attrs={'class': 'name-hyperlink'})
    name_final = name.text

    name_details = soup.find('div', attrs={'class': 'post-text'})
    name_details_final = name_details.text

    name_taglist = soup.find('div', attrs={'class': 'post-taglist'})
    name_taglist_final = name_taglist.text

    name_accepted_tmp = soup.find('div', attrs={'class': 'accepted-name'})
    name_accepted = name_accepted_tmp.find('div', attrs={'class': 'post-text'}) …
Run Code Online (Sandbox Code Playgroud)

python web-scraping python-3.6

-1
推荐指数
1
解决办法
1014
查看次数

标签 统计

python ×1

python-3.6 ×1

web-scraping ×1