R 脚本中的 here() 问题

r0b*_*rts 9 cron r path knitr

R 脚本中的问题

我试图了解 here() 如何以可移植的方式工作。找到它:在最终答案下查看稍后的工作- TL; DR - 底线,从命令行here()运行 a 并不是那么有用script.R

我在 JBGruber 的帮助下理解它的方式:here()查找项目的根目录(例如,RStudio 项目、Git 项目或其他使用 .here 文件定义的项目)从当前工作目录开始并向上移动,直到找到任何项目。如果它没有找到任何东西,它会退回到使用完整的工作目录。如果由 cron 运行的脚本将默认为我的主目录。当然,可以通过 cron 命令将目录作为参数传递,但它相当麻烦。下面的答案提供了很好的解释,我总结了我在“最终答案部分”下发现最有用的内容。但不要误会,尼古拉的回答非常好,也很有帮助。

原始目标- 编写一组 R 脚本,包括 R-markdown,.Rmd以便我可以压缩目录,发送给其他人,并且它可以在他们的计算机上运行。可能在非常低端的计算机上 - 例如 RaspberryPi 或运行 linux 的旧硬件。

状况:

  • 可以通过命令行运行 Rscript
  • 如上所述,但通过 cron
  • 设置工作目录的主要方法是set_here()- 从控制台执行一次,然后该文件夹是可移植的,因为该.here文件包含在压缩目录中。
  • 不需要Rstudio- 因此不想做 R 项目
  • 可以也可以从交互方式运行Rstudio(开发)
  • 可以从执行shiny(我认为如果满足上述条件就可以了)

我特别不想创建 Rstudio 项目,因为在我看来它需要安装和使用 Rstudio,但我希望我的脚本尽可能可移植并在低资源、无头平台上运行。

示例代码:

让我们假设工作目录myGoodScripts如下:

/Users/john/src/myGoodScripts/

开始开发时,我会转到上述目录setwd()并执行set_here()以创建.here文件。然后有2个脚本dataFetcherMailer.RdataFetcher.Rmd和一个子目录bkp

数据获取邮件程序

library(here)
library(knitr)

basedir <- here()
# this is where here should give path to .here file

rmarkdown::render(paste0(basedir,"/dataFetcher.Rmd"))

# email the created report
# email_routine_with_gmailr(paste0(basedir,"dataFetcher.pdf"))
# now substituted with verification that a pdf report was created
file.exists(paste0(basedir,"/dataFetcher.pdf"))
Run Code Online (Sandbox Code Playgroud)

数据获取器

---
title: "Data collection control report"
author: "HAL"
date: "`r Sys.Date()`"
output: pdf_document
---

```{r setup, include=FALSE}
library(knitr)
library(here)

basedir <- here()

# in actual program this reads data from a changing online data source
df.main <- mtcars

# data backup
datestamp <- format(Sys.time(),format="%Y-%m-%d_%H-%M")
backupName <- paste0(basedir,"/bkp/dataBackup_",datestamp,"csv.gz")
write.csv(df.main, gzfile(backupName))
```

# This is data collection report

Yesterday's data total records: `r nrow(df.main)`. 

The basedir was `r basedir`

The current directory is `r getwd()`

The here path is `r here()`
Run Code Online (Sandbox Code Playgroud)

我猜报告中的最后 3 行是匹配的。即使getwd()不匹配其他两个,也应该没有关系,因为here()会确保绝对的 basepath。

错误

当然 - 以上不起作用。它仅在我Rscript ./dataFetcherMailer.R从同一myGoodScripts/目录执行时才有效。

我的目标是了解如何执行脚本,以便相对于脚本的位置解析相对路径,并且可以独立于当前工作目录从命令行运行脚本。现在,只有在我完成cd了包含脚本的目录后,我才能从 bash 运行它。如果我安排cron执行脚本,默认工作目录将是/home/user并且脚本失败。我的天真方法不管 shell 的当前工作目录如何basedir <- here()都应该给出一个文件系统点,从中可以解析相对路径是行不通的。

从 Rstudio 没有事先 setwd()

here() starts at /home/user
Error in abs_path(input) : 
The file '/home/user/dataFetcher.Rmd' does not exist.
Run Code Online (Sandbox Code Playgroud)

从 bash 使用Rscriptif cwd 未设置为脚本目录。

$ cd /home/user/scrc
$ Rscript ./myGoodScripts/dataFetcherMailer.R 
here() starts at /home/user/src
Error in abs_path(input) : 
The file '/home/user/src/dataFetcher.Rmd' does not exist.
Calls: <Anonymous> -> setwd -> dirname -> abs_path
Run Code Online (Sandbox Code Playgroud)

如果有人能帮助我理解和解决这个问题,那就太棒了。如果不here()存在另一种设置基本路径的可靠方法,我很想知道。最终Rstudiocommandline/cron.

自 JBGruber 回答以来的更新:

我稍微修改了该函数,以便它可以返回文件的文件名或目录。我目前正在尝试修改它,以便在.RmdRstudio编织文件并通过 R 文件同样运行时它可以工作。

here2 <- function(type = 'dir') {
  args <- commandArgs(trailingOnly = FALSE)
  if ("RStudio" %in% args) {
    filepath <- rstudioapi::getActiveDocumentContext()$path
  } else if ("interactive" %in% args) {
    file_arg <- "--file="
    filepath <- sub(file_arg, "", grep(file_arg, args, value = TRUE))
  } else if ("--slave" %in% args) {
    string <- args[6]
    mBtwSquotes <- "(?<=')[^']*[^']*(?=')"
    filepath <- regmatches(string,regexpr(mBtwSquotes,string,perl = T))
  } else if (pmatch("--file=" ,args)) {
    file_arg <- "--file="
    filepath <- sub(file_arg, "", grep(file_arg, args, value = TRUE))
  } else {
    if (type == 'dir') {
      filepath <- '.'
      return(filepath)
    } else {
      filepath <- "error"
      return(filepath)
    }
  }
  if (type == 'dir') {
    filepath <- dirname(filepath)
  }  
  return(filepath)
}
Run Code Online (Sandbox Code Playgroud)

然而,我发现它们commandArgs()是从 R 脚本继承的,即.Rmd 当它从script.R. 因此只有basepathfromscript.R位置可以通用,而不是文件名。换句话说,这个函数在放置在.Rmd文件中时将指向调用script.R路径而不是.Rmd文件路径。

最终答案(TL; DR)

因此,此函数的较短版本将更有用:

here2 <- function() {
  args <- commandArgs(trailingOnly = FALSE)
  if ("RStudio" %in% args) {
    # R script called from Rstudio with "source file button"
    filepath <- rstudioapi::getActiveDocumentContext()$path
  } else if ("--slave" %in% args) {
    # Rmd file called from Rstudio with "knit button"  
    # (if we placed this function in a .Rmd file)
    file_arg <- "rmarkdown::render"
    string <- grep(file_arg, args, value = TRUE)
    mBtwQuotes <- "(?<=')[^']*[^']*(?=')"
    filepath <- regmatches(string,regexpr(mBtwQuotes,string,perl = T))
  } else if ((sum(grepl("--file=" ,args))) >0) {
    # called in some other way that passes --file= argument
    # R script called via cron or commandline using Rscript
    file_arg <- "--file="
    filepath <- sub(file_arg, "", grep(file_arg, args, value = TRUE))
  } else if (sum(grepl("rmarkdown::render" ,args)) >0 ) {
    # Rmd file called to render from commandline with 
    # Rscript -e 'rmarkdown::render("RmdFileName")'
    file_arg <- "rmarkdown::render"
    string <- grep(file_arg, args, value = TRUE)
    mBtwQuotes <- "(?<=\")[^\"]*[^\"]*(?=\")"
    filepath <- regmatches(string,regexpr(mBtwQuotes,string,perl = T))
  } else {
    # we do not know what is happening; taking a chance; could have  error later
    filepath <- normalizePath(".")
    return(filepath)
  }
  filepath <- dirname(filepath)
  return(filepath)
}
Run Code Online (Sandbox Code Playgroud)

注意:.Rmd文件内到达文件的包含目录就足够了normalizePath(".")- 无论您是.Rmd从脚本、命令行还是从 Rstudio调用文件,它都可以工作。

JBG*_*ber 4

你要求什么

我认为,这里的行为here()并不是您真正想要的。相反,您要寻找的是确定源文件(也称为文件)的路径.R。我对该here()命令进行了一些扩展,使其按照您期望的方式运行:

here2 <- function() {
  args <- commandArgs(trailingOnly = FALSE)
  if ("RStudio" %in% args) {
    dirname(rstudioapi::getActiveDocumentContext()$path)
  } else {
    file_arg <- "--file="
    filepath <- sub(file_arg, "", grep(file_arg, args, value = TRUE))
    dirname(filepath)
  }
}
Run Code Online (Sandbox Code Playgroud)

当脚本不在 RStudio 中运行时的情况的想法来自这个答案。我通过将函数定义粘贴到文件的开头来尝试此操作dataFetcherMailer.R。您还可以考虑将其放在主目录中的另一个文件中,并使用例如来调用它,而source("here2.R")不是library(here)或者您可以为此目的编写一个小型 R 包。

r0berts 最终版本(op)

here2 <- function() {
  args <- commandArgs(trailingOnly = FALSE)
  if ("RStudio" %in% args) {
    # R script called from Rstudio with "source file button"
    filepath <- rstudioapi::getActiveDocumentContext()$path
  } else if ("--slave" %in% args) {
    # Rmd file called from Rstudio with "knit button"  
    # (if we placed this function in a .Rmd file)
    file_arg <- "rmarkdown::render"
    string <- grep(file_arg, args, value = TRUE)
    mBtwQuotes <- "(?<=')[^']*[^']*(?=')"
    filepath <- regmatches(string,regexpr(mBtwQuotes,string,perl = T))
  } else if ((sum(grepl("--file=" ,args))) >0) {
    # called in some other way that passes --file= argument
    # R script called via cron or commandline using Rscript
    file_arg <- "--file="
    filepath <- sub(file_arg, "", grep(file_arg, args, value = TRUE))
  } else if (sum(grepl("rmarkdown::render" ,args)) >0 ) {
    # Rmd file called to render from commandline with 
    # Rscript -e 'rmarkdown::render("RmdFileName")'
    file_arg <- "rmarkdown::render"
    string <- grep(file_arg, args, value = TRUE)
    mBtwQuotes <- "(?<=\")[^\"]*[^\"]*(?=\")"
    filepath <- regmatches(string,regexpr(mBtwQuotes,string,perl = T))
  } else {
    # we do not know what is happening; taking a chance; could have  error later
    filepath <- normalizePath(".")
    return(filepath)
  }
  filepath <- dirname(filepath)
  return(filepath)
}
Run Code Online (Sandbox Code Playgroud)

我认为大多数人真正需要的

I found this way a while ago but then actually changed my workflow entirely to only use R Markdown files (and RStudio projects). One of the advantages of this is that the working directory of Rmd files is always the location of the file. So instead of bothering with setting a working directory, you can just write all paths in your script relative to the Rmd file location.

---
title: "Data collection control report"
author: "HAL"
date: "`r Sys.Date()`"
output: pdf_document
---

```{r setup, include=FALSE}
library(knitr)

# in actual program this reads data from a changing online data source
df.main <- mtcars

# data backup
datestamp <- format(Sys.time(),format="%Y-%m-%d_%H-%M")

# create bkp folder if it doesn't exist
if (!dir.exists(paste0("./bkp/"))) dir.create("./bkp/")

backupName <- paste0("./bkp/dataBackup_", datestamp, "csv.gz")
write.csv(df.main, gzfile(backupName))
```

# This is data collection report

Yesterday's data total records: `r nrow(df.main)`. 

The current directory is `r getwd()`
Run Code Online (Sandbox Code Playgroud)

Note that paths starting with ./ mean to start in the folder of the Rmd file. ../ means you go one level up. ../../ you go two levels up and so on. So if your Rmd file is in a folder called "scripts" in your root folder, and you want to save your data in a folder called "data" in your root folder, you write saveRDS(data, "../data/dat.RDS").

You can run the Rmd file from command line/cron with Rscript -e 'rmarkdown::render("/home/johannes/Desktop/myGoodScripts/dataFetcher.Rmd")'.