R: Fast way to group matrix rows by considering only the place of NA values

Kar*_*ius 3 performance r matrix na

I am trying to group the rows of a matrix by their unique placement of NA values within each columns.

For example with the following matrix:

1, 2, NA, 3 NA
2, 5, NA, 4, 5
3, 2,  1, 0, 7
5, 3, NA, 9, 3
0, 2,  1, 4, 6
Run Code Online (Sandbox Code Playgroud)

The answer would be:

1, 2, 3, 2, 3
Run Code Online (Sandbox Code Playgroud)

Indicating that there were 3 distinct groups and i.e. rows 2 and 4 were in the same group.

The trouble is that I can not come up with a quick way to achieve this. Here is my current implementation:

mat <- matrix(rnorm(10000*100), ncol=100)
mat[sample(length(mat), nrow(mat))] <- NA

getNAgroups <- function(x) {
  allnas  <- t(!is.na(x))
  nacases <- unique(allnas, MARGIN=2)
  groups  <- numeric(nrow(x))
  for(i in 1:ncol(nacases)) {
    groups[colMeans(allnas == nacases[,i]) == 1] <- i
  }
  groups
}
Run Code Online (Sandbox Code Playgroud)

Which is a bit too slow for the purposes I have in mind:

system.time(getNAgroups(mat))
   user  system elapsed
  7.672   1.686   9.386
Run Code Online (Sandbox Code Playgroud)

Cle*_*ang 5

这是在NA位置列表中使用匹配的一种方法:

mat <- matrix(c(1, 2, NA, 3, NA,
2, 5, NA, 4, 5,
3, 2,  1, 0, 7,
5, 3, NA, 9, 3,
0, 2,  1, 4, 6), 5, byrow = TRUE)


categ <- apply(is.na(mat), 1, which)
match(categ, unique(categ))
Run Code Online (Sandbox Code Playgroud)