Kar*_*ius 3 performance r matrix na
I am trying to group the rows of a matrix by their unique placement of NA values within each columns.
For example with the following matrix:
1, 2, NA, 3 NA
2, 5, NA, 4, 5
3, 2, 1, 0, 7
5, 3, NA, 9, 3
0, 2, 1, 4, 6
Run Code Online (Sandbox Code Playgroud)
The answer would be:
1, 2, 3, 2, 3
Run Code Online (Sandbox Code Playgroud)
Indicating that there were 3 distinct groups and i.e. rows 2 and 4 were in the same group.
The trouble is that I can not come up with a quick way to achieve this. Here is my current implementation:
mat <- matrix(rnorm(10000*100), ncol=100)
mat[sample(length(mat), nrow(mat))] <- NA
getNAgroups <- function(x) {
allnas <- t(!is.na(x))
nacases <- unique(allnas, MARGIN=2)
groups <- numeric(nrow(x))
for(i in 1:ncol(nacases)) {
groups[colMeans(allnas == nacases[,i]) == 1] <- i
}
groups
}
Run Code Online (Sandbox Code Playgroud)
Which is a bit too slow for the purposes I have in mind:
system.time(getNAgroups(mat))
user system elapsed
7.672 1.686 9.386
Run Code Online (Sandbox Code Playgroud)
这是在NA位置列表中使用匹配的一种方法:
mat <- matrix(c(1, 2, NA, 3, NA,
2, 5, NA, 4, 5,
3, 2, 1, 0, 7,
5, 3, NA, 9, 3,
0, 2, 1, 4, 6), 5, byrow = TRUE)
categ <- apply(is.na(mat), 1, which)
match(categ, unique(categ))
Run Code Online (Sandbox Code Playgroud)