Newspaper3k API 文章 download() 失败，HTTPSConnectionPool 端口=443 读取超时。（读取超时=7）在 URL 上

Question

Newspaper3k API 文章 download() 失败，HTTPSConnectionPool 端口=443 读取超时。（读取超时=7）在 URL 上

Mon*_*lal 1 python https timeout python-3.x newspaper3k

在 Firefox 中浏览时，我可以看到http://www.chicagotribune.com/ct-florida-school-shooter-nikolas-cruz-20180217-story.html 。但是，newspaper3k给了我这个错误：

Article download() failed with HTTPSConnectionPool(host='www.chicagotribune.com', port=443): Read timed out. (read timeout=7) on URL http://www.chicagotribune.com/ct-florida-school-shooter-nikolas-cruz-20180217-story.html

Run Code Online (Sandbox Code Playgroud)

我的代码是：

from newspaper import Article
from newspaper import Config

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
config = Config()

config.browser_user_agent = user_agent

url = "https://www.chicagotribune.com/nation-world/ct-florida-school-shooter-nikolas-cruz-20180217-story.html"

page = Article(url, config=config)


page.download()
page.parse()
print(page.text)

Run Code Online (Sandbox Code Playgroud)

我认为像“renewIPAddress()”之类的东西可能会有所帮助，但我不确定如何准确地将其适合此代码。/sf/answers/3534773791/

Answer 1

Lif*_*lex 7

您可能已经解决了这个问题。您的代码工作正常，但在某个特定时刻发生的某些事情导致发生“读取超时”。我发现报纸连接偶尔会超时，因为它使用Python模块请求。 这些超时通常链接到您正在查询的源。news3k 确实支持 Config() 中的超时参数，这有助于防止将来出现“读取超时”问题。

from newspaper import Article
from newspaper import Config

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'

config = Config()
config.browser_user_agent = user_agent
config.request_timeout = 10

url = "https://www.chicagotribune.com/nation-world/ct-florida-school-shooter-nikolas-cruz-20180217-story.html"

page = Article(url, config=config)

page.download()
page.parse()
print(page.text)

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，7 月前
查看次数：	3450 次
最近记录：	3 年，12 月前