我想在reddit上收集一些帖子标题来做分析。通过不断调试我的代码,我可以得到一些帖子的标题。突然我在尝试使用 PRAW 收集帖子时收到了 Forbidden 403。网上的解释是:“绝对禁止访问您试图访问的页面或资源。换句话说,403 错误意味着您无权访问您试图查看的任何内容”。请告诉我我该怎么做。谢谢
尝试添加一些标题并使用时间延迟
url="https://www.reddit.com"
my_headers=["Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html",
"Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Safari/605.1.15",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31"
]
def get_content(url,headers):
randdom_header=random.choice(headers)
req=urllib.Request(url)
req.add_header("User-Agent",randdom_header)
req.add_header("Host","www.reddit.com")
req.add_header("Referer","https://www.reddit.com")
req.add_header("GET",url)
content=urllib.urlopen(req).read()
return content
print (get_content(url,my_headers))
Run Code Online (Sandbox Code Playgroud)