在R中,如何使用自定义行尾(eol)读取文件

Question

在R中,如何使用自定义行尾(eol)读取文件

Cor*_*ado 5 r eol read.table

我有一个要在R中读取的文本文件(并存储在data.frame中).该文件按行和列组织."sep"和"eol"都是定制的.

问题:自定义eol,即"\ t&nd"(不带引号),不能在read.table(...)(或read.csv(...),read.csv2(...)中设置, ......)也不是恐惧(......),我无法找到解决方案.

我在这里搜索("[r] read eol"和其他我不记得了)并且我找不到解决方案:唯一一个是预处理文件改变eol(在我的情况下不可能因为某些字段我可以找到类似\n,\n \n,\n\r \n,",......这就是自定义的原因).

谢谢!

Answer 1

C8H*_*4O2 1

您可以通过两种不同的方式来解决这个问题：

A. 如果文件不太宽，您可以使用读取所需的行，scan并使用将其拆分为所需的列strsplit，然后合并为data.frame. 例子：

# Provide reproducible example of the file ("raw.txt" here) you are starting with
your_text <- "a~b~c!1~2~meh!4~5~wow"
write(your_text,"raw.txt"); rm(your_text)  

eol_str = "!" # whatever character(s) the rows divide on
sep_str = "~" # whatever character(s) the columns divide on

# read and parse the text file   
# scan gives you an array of row strings (one string per row)
# sapply strsplit gives you a list of row arrays (as many elements per row as columns)
f <- file("raw.txt")
row_list <- sapply(scan("raw.txt", what=character(), sep=eol_str), 
                   strsplit, split=sep_str) 
close(f)

df <- data.frame(do.call(rbind,row_list[2:length(row_list)]))
row.names(df) <- NULL
names(df) <- row_list[[1]]

df
#   a b   c
# 1 1 2 meh
# 2 4 5 wow

Run Code Online (Sandbox Code Playgroud)

B. 如果 A 不起作用，我同意 @BondedDust 的观点，您可能需要一个外部实用程序 - 但您可以在 R 中调用它，system()并执行查找/替换来重新格式化您的文件read.table。您的调用将特定于您的操作系统。示例： https: //askubuntu.com/questions/20414/find-and-replace-text-within-a-file-using-commands。既然您注意到您已经在文本中添加了\n, \r\n，我建议您首先找到它们并将其替换为临时占位符（可能是它们本身的引用版本），然后您可以在构建data.frame.

归档时间：	10 年，7 月前
查看次数：	2302 次
最近记录：	10 年，5 月前