我有一个像这样的矩阵(1000 x 2830):
9178 3574 3547
160 B_B B_B A_A
301 B_B A_B A_B
303 B_B B_B A_A
311 A_B A_B A_A
312 B_B A_B A_A
314 B_B A_B A_A
Run Code Online (Sandbox Code Playgroud)
我想获得以下内容(复制colnames并拆分每列的每个元素):
9178 9178 3574 3574 3547 3547
160 B B B B A A
301 B B A B A B
303 B B B B A A
311 A B A B A A
312 B B A B A A
314 B B A B A A
Run Code Online (Sandbox Code Playgroud)
我尝试使用,strsplit但我收到错误消息,因为这是一个矩阵,而不是一个字符串.你能提出一些解决这个问题的想法吗?
这是一个使用dplyr(for bind_cols)和tidyr(for separate_)以及lapply来自基础R 的选项.它假设您的数据是data.frame(即您可能需要先将其转换为data.frame):
library(dplyr)
library(tidyr)
lapply(names(df), function(x) separate_(df[x], x, paste0(x,"_",1:2), sep = "_" )) %>%
bind_cols
# X9178_1 X9178_2 X3574_1 X3574_2 X3547_1 X3547_2
#1 B B B B A A
#2 B B A B A B
#3 B B B B A A
#4 A B A B A A
#5 B B A B A A
#6 B B A B A A
Run Code Online (Sandbox Code Playgroud)
我有偏见,但我建议使用cSplit我的"splitstackshape"包.由于rownames您的输入中似乎有,请使用as.data.table(., keep.rownames = TRUE):
library(splitstackshape)
cSplit(as.data.table(mydf, keep.rownames = TRUE), names(mydf), "_")
# rn X9178_1 X9178_2 X3574_1 X3574_2 X3547_1 X3547_2
# 1: 160 B B B B A A
# 2: 301 B B A B A B
# 3: 303 B B B B A A
# 4: 311 A B A B A A
# 5: 312 B B A B A A
# 6: 314 B B A B A A
Run Code Online (Sandbox Code Playgroud)
不太清晰cSplit(但目前可能更快)将使用stri_split_fixed"stringi",如下所示:
library(stringi)
`dimnames<-`(do.call(cbind,
lapply(mydf, stri_split_fixed, "_", simplify = TRUE)),
list(rownames(mydf), rep(colnames(mydf), each = 2)))
# X9178 X9178 X3574 X3574 X3547 X3547
# 160 "B" "B" "B" "B" "A" "A"
# 301 "B" "B" "A" "B" "A" "B"
# 303 "B" "B" "B" "B" "A" "A"
# 311 "A" "B" "A" "B" "A" "A"
# 312 "B" "B" "A" "B" "A" "A"
# 314 "B" "B" "A" "B" "A" "A"
Run Code Online (Sandbox Code Playgroud)
如果速度至关重要,我建议查看"iotools"包,特别是mstrsplit功能.该方法类似于"stringi"方法:
library(iotools)
`dimnames<-`(do.call(cbind,
lapply(mydf, mstrsplit, "_", ncol = 2, type = "character")),
list(rownames(mydf), rep(colnames(mydf), each = 2)))
Run Code Online (Sandbox Code Playgroud)
lapply(mydf, as character)如果你stringsAsFactors = FALSE在从a转换为a时忘记使用,你可能需要添加matrix一个data.frame,但它仍然应该击败甚至stri_split方法.