Mon*_*lal 1 python https timeout python-3.x newspaper3k
在 Firefox 中浏览时,我可以看到http://www.chicagotribune.com/ct-florida-school-shooter-nikolas-cruz-20180217-story.html 。但是,newspaper3k给了我这个错误:
Article download() failed with HTTPSConnectionPool(host='www.chicagotribune.com', port=443): Read timed out. (read timeout=7) on URL http://www.chicagotribune.com/ct-florida-school-shooter-nikolas-cruz-20180217-story.html
Run Code Online (Sandbox Code Playgroud)
我的代码是:
from newspaper import Article
from newspaper import Config
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
config = Config()
config.browser_user_agent = user_agent
url = "https://www.chicagotribune.com/nation-world/ct-florida-school-shooter-nikolas-cruz-20180217-story.html"
page = Article(url, config=config)
page.download()
page.parse()
print(page.text)
Run Code Online (Sandbox Code Playgroud)
我认为像“renewIPAddress()”之类的东西可能会有所帮助,但我不确定如何准确地将其适合此代码。/sf/answers/3534773791/
您可能已经解决了这个问题。您的代码工作正常,但在某个特定时刻发生的某些事情导致发生“读取超时”。我发现 报纸连接偶尔会超时,因为它使用Python模块请求。 这些超时通常链接到您正在查询的源。news3k 确实支持 Config() 中的超时参数,这有助于防止将来出现“读取超时”问题。
from newspaper import Article
from newspaper import Config
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
config = Config()
config.browser_user_agent = user_agent
config.request_timeout = 10
url = "https://www.chicagotribune.com/nation-world/ct-florida-school-shooter-nikolas-cruz-20180217-story.html"
page = Article(url, config=config)
page.download()
page.parse()
print(page.text)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3450 次 |
| 最近记录: |