Tav*_*nis 2 optimization r cluster-analysis depth-first-search
给定 R 编程语言中的以下矩阵:
set.seed(123)
matrix_1 <- matrix(rbinom(100, 1, 0.5), nrow = 10, ncol = 10)
Run Code Online (Sandbox Code Playgroud)
这是一个深度优先搜索 (DFS) 算法,用于识别该矩阵中 1 的簇。在此上下文中,“簇”是整数在矩阵上的连续映射,最小簇大小为 3,并假设 8 个连通性(即,包括对角线)。注意:我尝试对包使用基于图像的方法EBImage,但其执行速度对于我的目的来说太慢了。我有数千个 100X100 矩阵需要分析!
find_clusters <- function(matrix) {
rows <- nrow(matrix)
cols <- ncol(matrix)
# Create a matrix of the same size to mark visited cells
visited <- matrix(0, nrow = rows, ncol = cols)
# Define all 8 possible movements from a cell (8-connectivity)
row_nbr <- c(-1, -1, -1, 0, 0, 1, 1, 1)
col_nbr <- c(-1, 0, 1, -1, 1, -1, 0, 1)
# A function to check if a cell can be included in the DFS
is_valid <- function(row, col) {
row >= 1 && row <= rows && col >= 1 && col <= cols &&
visited[row, col] == 0 && matrix[row, col] == 1
}
# A function to do a DFS of a 2D boolean matrix. It only considers
# the 8 cells directly connected to a cell
DFS <- function(matrix, row, col, visited, cluster) {
row_stack <- c(row)
col_stack <- c(col)
while (length(row_stack) > 0) {
r <- row_stack[length(row_stack)]
c <- col_stack[length(col_stack)]
row_stack <- row_stack[-length(row_stack)]
col_stack <- col_stack[-length(col_stack)]
if (visited[r, c] == 0) {
visited[r, c] <- 1
cluster <- rbind(cluster, c(r, c))
for (k in 1:8) {
if (is_valid(r + row_nbr[k], c + col_nbr[k])) {
row_stack <- c(row_stack, r + row_nbr[k])
col_stack <- c(col_stack, c + col_nbr[k])
}
}
}
}
return(cluster)
}
# The main function that returns all clusters
get_clusters <- function(matrix, visited) {
clusters <- list()
for (i in 1:rows) {
for (j in 1:cols) {
if (visited[i, j] == 0 && matrix[i, j] == 1) {
new_cluster <- DFS(matrix, i, j, visited, matrix(, nrow = 0, ncol = 2))
if (nrow(new_cluster) >= 3) {
clusters[[length(clusters) + 1]] <- new_cluster
}
}
}
}
return(clusters)
}
return(get_clusters(matrix, visited))
}
Run Code Online (Sandbox Code Playgroud)
效果很好而且速度很快。但是,此函数返回大小 > 3 的所有可能簇(总共 44 个簇),其中包括嵌套在较大簇内的较小簇。
矩阵作为二值图像:
my_palette <- c("white", "black")
# correct for how image() reads a matrix
rotate <- function(x) t(apply(x, 2, rev))
image(rotate(matrix_1),
axes = FALSE,
col = my_palette_2)
Run Code Online (Sandbox Code Playgroud)
我只看到三个大小 >= 3 的簇。如何修改我的函数以仅“看到”矩阵上最大的完整簇?
更新
谢谢@I_O!我有来自 MATLAB 模拟的 10000 个 100X100 矩阵,用于模拟细胞膜上钠通道的行为。以下函数实现您的建议并返回通道类型 1 和 2 的簇大小:
library(dplyr)
library(terra)
# M: matrix of integers
find_clusters_2chan <- function(M) {
# Consider only 1s
ones <- M == 1
# convert matrix to raster
raster_ones <- ones |> rast()
# find clusters (consider zeros as NA, i. e. discontinuation)
clusters_ones <- patches(raster_ones,
directions = 8,
zeroAsNA = TRUE)
# generate frequency table
ones_freq <- the_clusters_ones |> freq()
# return counts >=3
ones_freq$count %>%
.[. >= 3] -> ONES
#-------------------------------------------------------------------------------
# Consider only 2s
twos <- M == 2
# convert matrix to raster
raster_twos <- twos |> rast()
# find clusters (consider zeros as NA, i. e. discontinuation)
clusters_twos <- patches(raster_twos,
directions = 8,
zeroAsNA = TRUE)
# generate frequency table
twos_freq <- clusters_twos |> freq()
# return counts >=3
twos_freq$count %>%
.[. >= 3] -> TWOS
clusters_list <- list(channel_1 = ONES,
channel_2 = TWOS)
return(clusters_list)
}
start <- Sys.time()
clusters_big_list <- lapply(list_of_matrices, find_clusters_2chan)
end <- Sys.time()
end - start
# run time = 3.902859 minutes
Run Code Online (Sandbox Code Playgroud)
I_O*_*_O 6
如果您愿意使用像 {terra} 这样的专用包进行栅格分析,那么以下操作应该方便快捷(编辑:包装函数以概括跨通道值和底部的最小簇大小)
library(terra)
set.seed(123)
matrix_1 <- matrix(rbinom(100, 1, 0.5), nrow = 10, ncol = 10)
Run Code Online (Sandbox Code Playgroud)
raster_1 <- matrix_1 |> rast()
Run Code Online (Sandbox Code Playgroud)
the_clusters <- patches(raster_1, directions = 8, zeroAsNA=TRUE)
Run Code Online (Sandbox Code Playgroud)
the_clusters |> plot()
Run Code Online (Sandbox Code Playgroud)
the_clusters |> freq()
Run Code Online (Sandbox Code Playgroud)
layer value count
1 1 1 24
2 1 2 8
3 1 3 1
4 1 4 2
5 1 5 12
Run Code Online (Sandbox Code Playgroud)
为任意数量的通道提取最小大小的簇的函数可能如下所示:
find_clusters_2_all_chans <-
function(M, channel_numbers, min_count) {
channel_numbers |>
Map(f = \(i){
M |>
rast() |>
app(fun = \(cell) ifelse(cell == i, cell, NA)) |>
patches(directions = 8, zeroAsNA = TRUE) |>
freq()|>
(\(m) m[m['count'] > min_count,])()
}) |>
setNames(paste0('channel_', channel_numbers))
}
Run Code Online (Sandbox Code Playgroud)
示例:为通道 1 和 2 选择大于三个像素的簇
find_clusters_2_all_chans(M, 1:2, 3)
Run Code Online (Sandbox Code Playgroud)
> $channel_1
layer value count
1 1 1 4
2 1 2 10
3 1 3 6
4 1 5 6
6 1 7 4
$channel_2
layer value count
5 1 5 12
Run Code Online (Sandbox Code Playgroud)