从网站上的文件夹下载所有文件

Question

从网站上的文件夹下载所有文件

我的问题是在R如何下载网站上的所有文件？我知道怎么一个接一个地做,但不是一次一个.例如:

http://www2.census.gov/geo/docs/maps-data/data/rel/t00t10/

Answer 1

我在页面上56个文件的一小部分(3)中测试了它,它工作正常.

## your base url
url <- "http://www2.census.gov/geo/docs/maps-data/data/rel/t00t10/"
## query the url to get all the file names ending in '.zip'
zips <- XML::getHTMLLinks(
    url, 
    xpQuery = "//a/@href['.zip'=substring(., string-length(.) - 3)]"
)
## create a new directory 'myzips' to hold the downloads
dir.create("myzips")
## save the current directory path for later
wd <- getwd()
## change working directory for the download
setwd("myzips")
## create all the new files
file.create(zips)
## download them all
lapply(paste0(url, zips), function(x) download.file(x, basename(x)))
## reset working directory to original
setwd(wd)

Run Code Online (Sandbox Code Playgroud)

现在所有的zip文件都在目录中myzips,可以进一步处理.作为替代,lapply()您也可以使用for()循环.

## download them all
for(u in paste0(url, zips)) download.file(u, basename(u))

Run Code Online (Sandbox Code Playgroud)

当然,设置quiet = TRUE可能不错,因为我们正在下载56个文件.

Answer 2

hrb*_*str 5

方法略有不同.

library(rvest)
library(httr)
library(pbapply)
library(stringi)

URL <- "http://www2.census.gov/geo/docs/maps-data/data/rel/t00t10/"

pg <- read_html(URL)
zips <- grep("zip$", html_attr(html_nodes(pg, "a[href^='TAB']"), "href"), value=TRUE)

invisible(pbsapply(zips, function(zip_file) {
  GET(URL %s+% zip_file, write_disk(zip_file))
}))

Run Code Online (Sandbox Code Playgroud)

你有一个进度条,内置"缓存"(write_disk不会覆盖已下载的文件).

您可以编写Richard的优秀代码来创建目录和文件检查.

归档时间：	10 年，2 月前
查看次数：	4766 次
最近记录：	10 年前