R有一个方便的跨平台tar()函数,可以tar和gzip文件.看来这个函数是为了整个目录而设计的.我希望使用此函数来tar和压缩目录的子集或单个文件.但是,我似乎无法做到这一点.我期待以下内容在当前工作目录中查找单个csv文件:
tar( "tst.tgz", "myCsv.csv", compression="gzip" )
Run Code Online (Sandbox Code Playgroud)
那么只能在目录上使用tar()函数吗?
我暂时通过创建一个临时目录,复制我的文件,然后整个临时目录来解决这个问题.但我希望有一个更简单的解决方案.这不需要复制文件,这对于大文件来说有些耗时.
Rei*_*son 10
我不认为这是可能的,因为你描述JD.该files参数传递到path的参数list.files,并因此它的工作原理是在目录中,而不是单个文件tarring的文件.
如果你准备编辑的内部函数,tar()可向做你想做的,通过调用摆弄list.files()内tar().有点小问题产生了tar2()下面的函数,它有额外的参数来控制list.files()返回的东西.使用此功能,我们可以通过这样的调用实现您想要的功能:
tar2("foo.tar", path = ".", pattern = "bar.csv", recursive = FALSE,
full.names = FALSE, all.files = FALSE)
Run Code Online (Sandbox Code Playgroud)
的all.files = FALSE,除非你有隐藏包含名称的文件可能是多余的"bar.csv".
该recursive = FALSE位只是停止功能搜索除了当前目录,这似乎是你想要的,并加快搜索,如果工作目录有很多文件和子文件夹.
这full.names = FALSE是关键.如果这样TRUE,则list.files()返回匹配的文件名"./bar.csv",这tar()将粘贴在tarball内的文件夹中.如果我们将其设置为FALSE,则list.files()返回"bar.csv",因此我们会根据请求获得包含单个CSV文件的tarball.
如果您的文件具有相似的名称并希望只找到所声明的文件名,请使用^和将其固定在模式中$,例如:
tar2("foo.tar", path = ".", pattern = "^bar.csv$", recursive = FALSE,
full.names = FALSE, all.files = FALSE)
Run Code Online (Sandbox Code Playgroud)
这是修改后的tar()功能tar2():
tar2 <- function (tarfile, files = NULL, compression = c("none", "gzip",
"bzip2", "xz"), compression_level = 6, tar = Sys.getenv("tar"),
pattern = NULL, all.files = TRUE, recursive = TRUE, full.names = TRUE)
{
if (is.character(tarfile)) {
TAR <- tar
if (nzchar(TAR) && TAR != "internal") {
flags <- switch(match.arg(compression), none = "cf",
gzip = "zcf", bzip2 = "jcf", xz = "Jcf")
cmd <- paste(TAR, flags, shQuote(tarfile), paste(shQuote(files),
collapse = " "))
return(invisible(system(cmd)))
}
con <- switch(match.arg(compression), none = file(tarfile,
"wb"), gzip = gzfile(tarfile, "wb", compress = compression_level),
bzip2 = bzfile(tarfile, "wb", compress = compression_level),
xz = xzfile(tarfile, "wb", compress = compression_level))
on.exit(close(con))
}
else if (inherits(tarfile, "connection"))
con <- tarfile
else stop("'tarfile' must be a character string or a connection")
files <- list.files(files, recursive = recursive, all.files = all.files,
full.names = full.names, pattern = pattern)
bf <- unique(dirname(files))
files <- c(bf[!bf %in% c(".", files)], files)
for (f in unique(files)) {
info <- file.info(f)
if (is.na(info$size)) {
warning(gettextf("file '%s' not found", f), domain = NA)
next
}
header <- raw(512L)
if (info$isdir && !grepl("/$", f))
f <- paste(f, "/", sep = "")
name <- charToRaw(f)
if (length(name) > 100L) {
if (length(name) > 255L)
stop("file path is too long")
s <- max(which(name[1:155] == charToRaw("/")))
if (is.infinite(s) || s + 100 < length(name))
stop("file path is too long")
warning("storing paths of more than 100 bytes is not portable:\n ",
sQuote(f), domain = NA)
prefix <- name[1:(s - 1)]
name <- name[-(1:s)]
header[345 + seq_along(prefix)] <- prefix
}
header[seq_along(name)] <- name
header[101:107] <- charToRaw(sprintf("%07o", info$mode))
uid <- info$uid
if (!is.null(uid) && !is.na(uid))
header[109:115] <- charToRaw(sprintf("%07o", uid))
gid <- info$gid
if (!is.null(gid) && !is.na(gid))
header[117:123] <- charToRaw(sprintf("%07o", gid))
size <- ifelse(info$isdir, 0, info$size)
header[137:147] <- charToRaw(sprintf("%011o", as.integer(info$mtime)))
if (info$isdir)
header[157L] <- charToRaw("5")
else {
lnk <- Sys.readlink(f)
if (is.na(lnk))
lnk <- ""
header[157L] <- charToRaw(ifelse(nzchar(lnk), "2",
"0"))
if (nzchar(lnk)) {
if (length(lnk) > 100L)
stop("linked path is too long")
header[157L + seq_len(nchar(lnk))] <- charToRaw(lnk)
size <- 0
}
}
header[125:135] <- charToRaw(sprintf("%011o", as.integer(size)))
header[258:262] <- charToRaw("ustar")
header[264:265] <- charToRaw("0")
s <- info$uname
if (!is.null(s) && !is.na(s)) {
ns <- nchar(s, "b")
header[265L + (1:ns)] <- charToRaw(s)
}
s <- info$grname
if (!is.null(s) && !is.na(s)) {
ns <- nchar(s, "b")
header[297L + (1:ns)] <- charToRaw(s)
}
header[149:156] <- charToRaw(" ")
checksum <- sum(as.integer(header))%%2^24
header[149:154] <- charToRaw(sprintf("%06o", as.integer(checksum)))
header[155L] <- as.raw(0L)
writeBin(header, con)
if (info$isdir || nzchar(lnk))
next
inf <- file(f, "rb")
for (i in seq_len(ceiling(info$size/512L))) {
block <- readBin(inf, "raw", 512L)
writeBin(block, con)
if ((n <- length(block)) < 512L)
writeBin(raw(512L - n), con)
}
close(inf)
}
block <- raw(512L)
writeBin(block, con)
writeBin(block, con)
invisible(0L)
}
Run Code Online (Sandbox Code Playgroud)