小编tot*_*hem的帖子

如何修复某些 URL 的 Newspaper3k 403 客户端错误?

我正在尝试使用 googlesearch 和 news3k python 包的组合来获取文章列表。使用 article.parse 时,我最终得到一个错误:news.article.ArticleException:文章download()因 403 客户端错误而失败:网址禁止:https : //www.newsweek.com/donald-trump-hillary-clinton-2020-拉力奥兰多-1444697在 URL https://www.newsweek.com/donald-trump-hillary-clinton-2020-rally-orlando-1444697

我尝试在执行脚本时以管理员身份运行,并且链接在浏览器中直接打开时有效。

这是我的代码:

import googlesearch
from newspaper import Article

query = "trump"
urlList = []

for j in googlesearch.search_news(query, tld="com", num=500, stop=200, pause=.01):
    urlList.append(j)

print(urlList)

articleList = []

for i in urlList:
    article = Article(i)
    article.download()
    article.html
    article.parse()
    articleList.append(article.text)
    print(article.text)
Run Code Online (Sandbox Code Playgroud)

这是我的完整错误输出:

Traceback (most recent call last):
  File "C:/Users/andre/PycharmProjects/StockBot/WebCrawlerTest.py", line 31, in <module>
    article.parse()
  File "C:\Users\andre\AppData\Local\Programs\Python\Python37\lib\site-packages\newspaper\article.py", line 191, in parse
    self.throw_if_not_downloaded_verbose()
  File "C:\Users\andre\AppData\Local\Programs\Python\Python37\lib\site-packages\newspaper\article.py", line …
Run Code Online (Sandbox Code Playgroud)

python url screen-scraping web python-newspaper

8
推荐指数
1
解决办法
2274
查看次数

标签 统计

python ×1

python-newspaper ×1

screen-scraping ×1

url ×1

web ×1