从knitr中的儿童文档中删除YAML

Tom*_*Tom 3 yaml r rstudio knitr r-markdown

我在rmarkdown写了一些相关文档,我将通过jekyll编译成一个网站.在这样做的过程中,我遇到了一个问题:

我正在使用的一些Rmd文件调用其他Rmd文件作为子文档.当我使用knitr渲染时,生成的文档包含来自父文档和子文档的yaml前端内容.下面给出一个例子.

到目前为止,当该文档是Rmd时,我没有看到任何方法仅指定子文档的部分内容.有没有人知道一种方法,当他们在knit()中读入父级Rmd时,我可以从子文档中剥离yaml?

我很乐意考虑R之外的答案,最好是我可以嵌入rakefile中.不过,我不想永久改变儿童文件.因此剥离yaml不能永久.最后,yaml的长度因文件而异,所以我猜任何解决方案都需要能够通过regex/grep/sed/etc找到yaml的开头和结尾...

例:

%%%% Parent_Doc.rmd %%%%

 ---
 title: parent doc
 layout: default 
 etc: etc
 ---
 This is the parent...

 ```{r child import, child="./child_doc."}
 ```
Run Code Online (Sandbox Code Playgroud)

%%%% child_doc.rmd %%%%

 ---
 title: child doc
 layout: default 
 etc: etc
 ---

 lorem ipsum etc
Run Code Online (Sandbox Code Playgroud)

%%%% output.md %%%%

 ---
 title: parent doc
 layout: default 
 etc: etc
 ---
 This is the parent...
 ---
 title: child doc
 layout: default 
 etc: etc
 ---

 lorem ipsum etc
Run Code Online (Sandbox Code Playgroud)

%%%%理想的Output.md %%%%

 ---
 title: parent doc
 layout: default 
 etc: etc
 ---
 This is the parent...

 lorem ipsum etc
Run Code Online (Sandbox Code Playgroud)

use*_*114 5

同时,也许以下内容适合您; 它是一种丑陋而低效的解决方法(我是knitr的新手并且不是真正的程序员),但它实现了我认为你想要做的事情.

我写了一个类似的个人用途的功能,包括以下相关位 ; 原文是西班牙语,所以我在下面翻译了一些:

extraction <- function(matter, escape = FALSE, ruta = ".", patron) {

  require(yaml)

  # Gather together directory of documents to be processed

  doc_list <- list.files(
    path = ruta,
    pattern = patron,
    full.names = TRUE
    )

  # Extract desired contents

  lapply(
    X = doc_list,
    FUN = function(i) {
      raw_contents <- readLines(con = i, encoding = "UTF-8")

      switch(
        EXPR = matter,

        # !YAML (e.g., HTML)

        "no_yaml" = {

          if (escape == FALSE) {

            paste(raw_contents, sep = "", collapse = "\n")

          } else if (escape == TRUE) {

            require(XML)
            to_be_escaped <- paste(raw_contents, sep = "", collapse = "\n")
            xmlTextNode(value = to_be_escaped)

          }

        },

        # YAML header and Rmd contents

        "rmd" = {
          yaml_pattern <- "[-]{3}|[.]{3}"
          limits_yaml <- grep(pattern = yaml_pattern, x = raw_contents)[1:2]
          indices_yaml <- seq(
            from = limits_yaml[1] + 1,
            to = limits_yaml[2] - 1
            )
          yaml <- mapply(
            FUN = function(i) {yaml.load(string = i)},
            raw_contents[indices_yaml],
            USE.NAMES = FALSE
            )
          indices_rmd <- seq(
            from = limits_yaml[2] + 1,
            to = length(x = raw_contents)
            )
          rmd<- paste(raw_contents[indices_rmd], sep = "", collapse = "\n")
          c(yaml, "contents" = rmd)
        },

        # Anything else (just in case)

        {
          stop("Matter not extractable")
        }

      )

    }
    )

}
Run Code Online (Sandbox Code Playgroud)

说我的主要RMD文件main.Rmd住在my_directory和我的孩子文件,01-abstract.Rmd,02-intro.Rmd,...,06-conclusion.Rmd都装在./sections; 请注意,对于我的业余功能,最好将子文档按顺序保存到主文档中(见下文).我有我的功能extraction.R./assets.这是我的示例目录的结构:

.
+--assets
|  +--extraction.R
+--sections
|  +--01-abstract.Rmd
|  +--02-intro.Rmd
|  +--03-methods.Rmd
|  +--04-results.Rmd
|  +--05-discussion.Rmd
|  +--06-conclusion.Rmd
+--stats
|  +--analysis.R
+--main.Rmd
Run Code Online (Sandbox Code Playgroud)

main.Rmd我导入我的子文档./sections:

---
title: Main
author: me
date: Today
output:
  html_document
---

```{r, 'setup', include = FALSE}
opts_chunk$set(autodep = TRUE)
dep_auto()
```

```{r, 'import_children', cache = TRUE, include = FALSE}
source('./assets/extraction.R')
rmd <- extraction(
  matter = 'rmd',
  ruta = './sections',
  patron = "*.Rmd"
  )
```

# Abstract

```{r, 'abstract', echo = FALSE, results = 'asis'}
cat(x = rmd[[1]][["contents"]], sep = "\n")
```

# Introduction

```{r, 'intro', echo = FALSE, results = 'asis'}
cat(x = rmd[[2]][["contents"]], sep = "\n")
```

# Methods

```{r, 'methods', echo = FALSE, results = 'asis'}
cat(x = rmd[[3]][["contents"]], sep = "\n")
```

# Results

```{r, 'results', echo = FALSE, results = 'asis'}
cat(x = rmd[[4]][["contents"]], sep = "\n")
```

# Discussion

```{r, 'discussion', echo = FALSE, results = 'asis'}
cat(x = rmd[[5]][["contents"]], sep = "\n")
```

# Conclusion

```{r, 'conclusion', echo = FALSE, results = 'asis'}
cat(x = rmd[[6]][["contents"]], sep = "\n")
```

# References
Run Code Online (Sandbox Code Playgroud)

然后我编织这份文件,只将我儿童文件的内容纳入其中,例如:

---
title: Main
author: me
date: Today
output:
  html_document
---





# Abstract


This is **Child Doc 1**, my abstract.

# Introduction


This is **Child Doc 2**, my introduction.

- Point 1
- Point 2
- Point *n*

# Methods


This is **Child Doc 3**, my "Methods" section.

|    method 1   |    method 2   |   method *n*   |
|---------------|---------------|----------------|
| fffffffffffff | fffffffffffff | fffffffffffff d|
| fffffffffffff | fffffffffffff | fffffffffffff d|
| fffffffffffff | fffffffffffff | fffffffffffff d|

# Results


This is **Child Doc 4**, my "Results" section.

## Result 1

```{r}
library(knitr)
```

```{r, 'analysis', cache = FALSE}
source(file = '../stats/analysis.R')
```

# Discussion


This is **Child Doc 5**, where the results are discussed.

# Conclusion


This is **Child Doc 6**, where I state my conclusions.

# References
Run Code Online (Sandbox Code Playgroud)

上述文件是针织版main.Rmd,即main.md.请注意## Result 1,在我的子文档中04-results.Rmd,我采购了一个外部R脚本,./stats/analysis.R现在在我的编织文档中作为新的knitr块加入; 因此,我现在需要再次编织该文件.

当子文档也包含块时,.md我会将主文档编织成另一个.Rmd,就像我有嵌套块一样多次,例如,继续上面的例子:

  1. 使用knit(input = './main.Rmd', output = './main_2.Rmd'),而不是编织main.Rmdmain.md,我会编织到另一个.Rmd,以便能够编织包含新导入的块生成的文件,例如,我的[R脚本analysis.R以上.
  2. 我现在可以编织我的main_2.Rmdmain.md或呈现为main.html通过rmarkdown::render(input = './main_2.Rmd', output_file = './main.html').

注意:在上面的示例中main.md,我的R脚本的路径是../stats/analysis.R.这是相对于源文件的子文档的路径./sections/04-results.Rmd.一旦我将子文档导入位于根目录的主文档my_directory,即,./main.md或者./main_2.Rmd,路径就会出错; 因此,我必须./stats/analysis.R在下一个针织之前手动纠正​​它.

我在上面提到过,最好将子文档保存为与导入主文档的顺序相同的顺序.这是因为我的简单函数extraction()只是将指定给它的所有文件的内容存储在一个未命名的列表中,因此我必须main.Rmd按编号访问每个文件,即rmd[[5]][["contents"]]引用子文档./sections/05-discussion.Rmd; 考虑:

> str(rmd)
List of 6
 $ :List of 4
  ..$ title     : chr "child doc 1"
  ..$ layout    : chr "default"
  ..$ etc       : chr "etc"
  ..$ contents: chr "\nThis is **Child Doc 1**, my abstract."
 $ :List of 4
  ..$ title     : chr "child doc 2"
  ..$ layout    : chr "default"
  ..$ etc       : chr "etc"
  ..$ contents: chr "\nThis is **Child Doc 2**, my introduction.\n\n- Point 1\n- Point 2\n- Point *n*"
 $ :List of 4
  ..$ title     : chr "child doc 3"
  ..$ layout    : chr "default"
  ..$ etc       : chr "etc"
  ..$ contents: chr "\nThis is **Child Doc 3**, my \"Methods\" section.\n\n| method 1 | method 2 | method *n* |\n|--------------|--------------|----"| __truncated__
 $ :List of 4
  ..$ title     : chr "child doc 4"
  ..$ layout    : chr "default"
  ..$ etc       : chr "etc"
  ..$ contents: chr "\nThis is **Child Doc 4**, my \"Results\" section.\n\n## Result 1\n\n```{r}\nlibrary(knitr)\n```\n\n```{r, cache = FALSE}\nsour"| __truncated__
 $ :List of 4
  ..$ title     : chr "child doc 5"
  ..$ layout    : chr "default"
  ..$ etc       : chr "etc"
  ..$ contents: chr "\nThis is **Child Doc 5**, where the results are discussed."
 $ :List of 4
  ..$ title     : chr "child doc 6"
  ..$ layout    : chr "default"
  ..$ etc       : chr "etc"
  ..$ contents: chr "\nThis is **Child Doc 6**, where I state my conclusions."
Run Code Online (Sandbox Code Playgroud)

所以,extraction()这里实际上存储了指定子文档的R Markdown内容,以及它们的YAML,以防你也使用它(我自己也这样做).