我正在尝试并行化管道。在管道中有一个 tidyr 命令(“tidyr::complete”)。一旦并行运行,这就会分解代码,因为无法识别对象类。
dplyr 中是否有替代方法可以完成?
library(dplyr)
library(tidyr)
library(zoo)
test <- tibble(year=c(1,2,3,4,5,5,1,4,5),
var_1=c(1,1,1,1,1,1,2,2,2),
var_2=c(1,1,1,1,1,2,3,3,3),
var_3=c(0,5,NA,15,20,NA,1,NA,NA))
max_year <- max(test$year,na.rm = T)
min_year <- min(test$year,na.rm = T)
Run Code Online (Sandbox Code Playgroud)
串行
test_serial <- test %>%
group_by(var_1,var_2) %>%
complete(var_1, year = seq(min_year,max_year)) %>%
mutate(
var_3 = na.approx(var_3,na.rm = FALSE),
var_3 = if(all(is.na(var_3))) NA else na.spline(var_3,na.rm = FALSE))
Run Code Online (Sandbox Code Playgroud)
并行(失败)
devtools::install_github("hadley/multidplyr")
library(multidplyr)
cl <- new_cluster(2)
cluster_copy(cl, c("test","max_year","min_year"))
cluster_library(cl, c("dplyr","tidyr","zoo"))
test_parallel <- test %>% group_by(var_1,var_2) %>% partition(cl)
test_parallel <- test_parallel %>%
dplyr::group_by(var_1,var_2) %>%
tidyr::complete(var_1, year = seq(min_year,max_year)) %>%
dplyr::mutate( …Run Code Online (Sandbox Code Playgroud) 我有一个大的稀疏矩阵(“dgCMatrix”,维度 5e+5 x 1e+6)。我需要计算每列有多少个非零值,并制作一个只有 1 个非零条目的列名称列表。
我的代码适用于小型矩阵,但对于我需要处理的实际矩阵来说计算量太大。
library(Matrix)
set.seed(0)
mat <- Matrix(matrix(rbinom(200, 1, 0.10), ncol = 20))
colnames(mat) <- letters[1:20]
entries <- colnames(mat[, nrow(mat) - colSums(mat == 0) == 1])
Run Code Online (Sandbox Code Playgroud)
任何建议都非常受欢迎!
我有一个sf对象,一张划分区域的地图。我想计算每个区的质心(使用st_point_on_surface),然后计算每个质心之间的相对距离,就像我可以在其上执行计算的距离矩阵(例如保留在特定半径内的矩阵)并获得每个区的 db标识符以及符合条件的标识符列表。
对于缺乏可重现的代码,提前表示歉意。最简单的方法是什么?
提前致谢
I have a df listing a number of areas (df$area) and the areas which these shares a border with (df$next_area).
Starting from it i want to get a similar df but with the neighbour of its neighbour.
I wrote the following, which works, but appear extremely convoluted.
Was there a better solution?
library(dplyr)
library(tidyr)
df <- data.frame(area=c("A","A","B","B","C","C","C","D"),next_area=c("B","C","A" ,"C","A","B","D","C") )
df <- df %>% group_by(area) %>%
summarize(next_area = list(sort(unique(as.character(next_area)))))
df$next_area_exploded <- df$next_area
for(i in 1:nrow(df)){
for(j in …Run Code Online (Sandbox Code Playgroud)