shN*_*NIL 5 sorting split r dataframe
我有以下问题,我可以解决:
set.seed (1234)
mydf <- data.frame (var1a = sample (c("TA", "AA", "TT"), 5, replace = TRUE),
varb2 = sample (c("GA", "AA", "GG"), 5, replace = TRUE),
varAB = sample (c("AC", "AA", "CC"), 5, replace = TRUE)
)
mydf
var1a varb2 varAB
1 TA AA CC
2 AA GA AA
3 AA GA AC
4 AA AA CC
5 TT AA AC
Run Code Online (Sandbox Code Playgroud)
我想将两个字母分成不同的列,然后按字母顺序排序.
编辑:可以在拆分之前完成排序,例如var1a值"TA"var1a应为"AT"或拆分后,var1aa应为"A",var1ab为"T"(而不是"T","A" ).所以排序在每个单元格内.
split_col <- function(.col, data){
.x <- colsplit( data[[.col]], names = paste0(.col, letters[1:2]))
}
Run Code Online (Sandbox Code Playgroud)
拆分每列并合并
require(reshape)
splitdf <- do.call(cbind, lapply(names(mydf), split_col, data = mydf))
var1aa var1ab varb2a varb2b varABa varABb
1 T A A A C C
2 A A G A A A
3 A A G A A C
4 A A A A C C
5 T T A A A C
Run Code Online (Sandbox Code Playgroud)
但是未解决的部分是我想订购这对列,使得列名"a"和列名"b"按字母顺序排序.因此预期输出:
var1aa var1ab varb2a varb2b varABa varABb
1 A T A A C C
2 A A A G A A
3 A A A G A C
4 A A A A C C
5 T T A A A C
Run Code Online (Sandbox Code Playgroud)
怎么可以订购(与每对变量短)?
mylist <-as.list(mydf)
splits <- lapply(mylist, reshape::colsplit, names=c("a", "b"))
rowsort <- lapply(splits, function(x) t(apply(x, 1, sort)))
comb <- do.call(data.frame, rowsort)
comb
var1a.1 var1a.2 varb2.1 varb2.2 varAB.a varAB.b
1 A T A A C C
2 A A A G A A
3 A A A G A C
4 A A A A C C
5 T T A A A C
Run Code Online (Sandbox Code Playgroud)
编辑:如果名称很重要,您可以替换它们:
replaceNums <- function(x){
.which <- regmatches(x, regexpr("[[:alnum:]]*(?=.)", x, perl=TRUE))
stopifnot(length(x) %% 2 == 0) #checkstep
paste0(.which, c("a", "b"))
}
names(comb) <- replaceNums(names(comb))
comb
var1aa var1ab varb2a varb2b varABa varABb
1 A T A A C C
2 A A A G A A
3 A A A G A C
4 A A A A C C
5 T T A A A C
Run Code Online (Sandbox Code Playgroud)