将文本文件导入R

Rag*_*aac 9 r

我有一个文本文件,其中包含超过100,000行,我每周从SAP下载.它被下载为页面,每个页面包含相同的标题和虚线.下面是一个包含两个页面的最小示例,每个页面只包含两个项目

------------------------------------------------------------
|date              |Material          |Description         |
|----------------------------------------------------------|
|10/04/2013        |WM.5597394        |PNEUMATIC           |
|11/07/2013        |GB.D040790        |RING                |
------------------------------------------------------------

------------------------------------------------------------
|date              |Material          |Description         |
|----------------------------------------------------------|
|08/06/2013        |WM.4M01004A05     |TOUCHEUR            |
|08/06/2013        |WM.4M010108-1     |LEVER               |
------------------------------------------------------------
Run Code Online (Sandbox Code Playgroud)

我想要做的是将此文件导入到R中,只有一个标题,没有虚线.我试过了:

read.table( "myfile.txt",  sep = "|", fill=TRUE)
Run Code Online (Sandbox Code Playgroud)

非常感谢

Sve*_*ein 9

另一种readLines方法:

l <- readLines("myfile.txt")

# remove unnecessary lines
l <- grep("^\\|?-+\\|?$|^$", l, value = TRUE, invert = TRUE)

# remove duplicated headers
l2 <- c(l[1], l[-1][l[-1] != l[1]])

# split
lsplit <- strsplit(l2, "\\s*\\|")

# create data frame
dat <- setNames(data.frame(do.call(rbind, lsplit[-1])[ , -1]), lsplit[[1]][-1])


        date      Material Description
1 10/04/2013    WM.5597394   PNEUMATIC
2 11/07/2013    GB.D040790        RING
3 08/06/2013 WM.4M01004A05    TOUCHEUR
4 08/06/2013 WM.4M010108-1       LEVER
Run Code Online (Sandbox Code Playgroud)