我有一个制表符分隔的文件,包含多个表格,每个表格都以标题为首,例如"Azuay \n","Bolivar \n","Cotopaxi \n"等,每个表格用两个换行符分隔.在R中,我如何读取该文件并仅选择对应于例如"Bolivar"的表(即指定的行),同时忽略对应于"Cotopaxi"的下表和上面对应于"Azuay"的表.
NB.我宁愿不在R外面修改表格.
数据看起来像这样.该文件以制表符分隔.
Azuay
region begin stop
1A 2017761 148749885
1A 148863885 150111299
1A 150329391 150346152
1A 150432847 247191037
Bolivar
region begin stop
2A 2785 242068364
2A 736640 198339289
Cotopaxi
region begin stop
4A 2282 9951846
4A 11672561 11906166
Run Code Online (Sandbox Code Playgroud)
这似乎做了这个工作:
read.entry.table <- function(file, entry) {
lines <- readLines(file)
table.entry <- lines == entry
if (sum(table.entry) != 1) stop(paste(entry, "not found"))
empty.lines <- which(lines == "")
empty.lines <- c(empty.lines, length(lines) + 1L)
table.start <- which(table.entry) + 1L
table.end <- empty.lines[which(empty.lines > table.start)[1]] - 1L
return(read.table(textConnection(lines[seq(from = table.start,
to = table.end)]),
header = TRUE))
}
read.entry.table("test.txt", "Bolivar")
# region begin stop
# 1 2A 2785 242068364
# 2 2A 736640 198339289
read.entry.table("test.txt", "Cotopaxi")
# region begin stop
# 1 4A 2282 9951846
# 2 4A 11672561 11906166
Run Code Online (Sandbox Code Playgroud)