Ste*_*ner 29 statistics text r list
我有一个大文本文件,每行有可变数量的字段.每行中的第一个条目对应于生物学途径,并且每个后续条目对应于该途径中的基因.前几行可能看起来像这样
path1   gene1 gene2
path2   gene3 gene4 gene5 gene6
path3   gene7 gene8 gene9
我需要将此文件作为列表读入R中,每个元素都是一个字符向量,列表中每个元素的名称是该行的第一个元素,例如:
> pathways <- list(
+     path1=c("gene1","gene2"), 
+     path2=c("gene3","gene4","gene5","gene6"),
+     path3=c("gene7","gene8","gene9")
+ )
> 
> str(pathways)
List of 3
 $ path1: chr [1:2] "gene1" "gene2"
 $ path2: chr [1:4] "gene3" "gene4" "gene5" "gene6"
 $ path3: chr [1:3] "gene7" "gene8" "gene9"
> 
> str(pathways$path1)
 chr [1:2] "gene1" "gene2"
> 
> print(pathways)
$path1
[1] "gene1" "gene2"
$path2
[1] "gene3" "gene4" "gene5" "gene6"
$path3
[1] "gene7" "gene8" "gene9"
...但我需要自动完成数千行.我之前在这里看过一个类似的问题,但我无法弄清楚如何从该线程中做到这一点.
提前致谢.
Jos*_*ich 41
这是一种方法:
# Read in the data
x <- scan("data.txt", what="", sep="\n")
# Separate elements by one or more whitepace
y <- strsplit(x, "[[:space:]]+")
# Extract the first vector element and set it as the list element name
names(y) <- sapply(y, `[[`, 1)
#names(y) <- sapply(y, function(x) x[[1]]) # same as above
# Remove the first vector element from each list element
y <- lapply(y, `[`, -1)
#y <- lapply(y, function(x) x[-1]) # same as above
一种解决方案是读取via中的数据read.table(),但使用fill = TRUE参数填充具有较少"条目"的行,将结果数据帧转换为列表,然后清理"空"元素.
首先,阅读您的数据片段:
con <- textConnection("path1   gene1 gene2
path2   gene3 gene4 gene5 gene6
path3   gene7 gene8 gene9
")
dat <- read.table(con, fill = TRUE, stringsAsFactors = FALSE)
close(con)
接下来,我们删除第一列,首先将其保存为稍后列表的名称
nams <- dat[, 1]
dat <- dat[, -1]
将数据框转换为列表.这里我只是在索引1,2,...,n上拆分数据框,其中n是行数:
ldat <- split(dat, seq_len(nrow(dat)))
清理空单元格:
ldat <- lapply(ldat, function(x) x[x != ""])
最后,应用名称
names(ldat) <- nams
赠送:
> ldat
$path1
[1] "gene1" "gene2"
$path2
[1] "gene3" "gene4" "gene5" "gene6"
$path3
[1] "gene7" "gene8" "gene9"
| 归档时间: | 
 | 
| 查看次数: | 42877 次 | 
| 最近记录: |