我正在尝试使用Python编写的爬虫来抓取网站.我想将Tor与Python集成,这意味着我想使用Tor匿名抓取该站点.
我试过这样做.它似乎不起作用.我检查了我的IP,它仍然与我使用tor之前的IP相同.我通过python检查了它.
import urllib2
proxy_handler = urllib2.ProxyHandler({"tcp":"http://127.0.0.1:9050"})
opener = urllib2.build_opener(proxy_handler)
urllib2.install_opener(opener)
Run Code Online (Sandbox Code Playgroud) 我有一份我需要下载的学术论文标题清单.我想写一个循环来从网上下载他们的PDF文件,但找不到办法.
以下是我到目前为止所考虑的一步一步(答案是欢迎使用R或Python):
# Create list with paper titles (example with 4 papers from different journals)
titles <- c("Effect of interfacial properties on polymer–nanocrystal thermoelectric transport",
"Reducing social and environmental impacts of urban freight transport: A review of some major cities",
"Using Lorenz curves to assess public transport equity",
"Green infrastructure: The effects of urban rail transit on air quality")
#Loop step1 - Query paper title in Google Scholar to get URL of journal webpage containing the paper
#Loop step2 - …Run Code Online (Sandbox Code Playgroud)