小编Lut*_*ett的帖子

如何在 R/future/furrr 中对并行 API 请求进行速率限制

我必须从 Web API (NCBI entrez) 检索大型数据集，该数据集将我每秒的请求数限制为一定数量，例如 10 个（示例代码将在没有 API 密钥的情况下将您限制为 3 个）。我使用 Furrr 的 future_* 函数来并行化请求以尽快获取它们，如下所示：

library(tidyverse)
library(rentrez)
library(furrr)

plan(multiprocess)

api_key <- "<api key>"
# this will return a crap-ton of results
srch <- entrez_search("nuccore", "Homo sapiens", use_history=T, api_key=api_key)

total <- srch$count
per_request <- 500 # get 500 records per parallel request
nrequest <- total %/% per_request + as.logical(total %% per_request)

result <- future_map(seq(nrequest),function(x) {
  rstart <- (x - 1) * per_request
  return(entrez_fetch(
    "nuccore",
    web_history = srch$web_history,
    rettype="fasta",
    retmode="xml",
    retstart=rstart, …

Run Code Online (Sandbox Code Playgroud)

parallel-processing multithreading r rate-limiting furrr

Lut*_*ett

2020 05-19

6
推荐指数

0
解决办法

744
查看次数

如何使用 sf 按因子从点构造/绘制多边形的凸包？

我有一个物种出现的数据集，我试图通过制作凸包将其转换为出现的区域。我可以手动执行此操作（即一次一个物种），但我真的很希望能够通过物种名称自动处理它。

可以在此处找到精简的示例数据集：https : //pastebin.com/dWxEvyUB

这是我目前手动执行的方法：

library(tidyverse)
library(sf)
library(rgeos)
library(maps)
library(mapview)
library(mapdata)
library(ggplot2)


fd <- read_csv("occurrence.csv")

spA.dist <- fd %>%
  filter(species == "sp.A") %>%
  dplyr::select(lon,lat) %>%
  as.matrix() %>%
  coords2Polygons(ID="distribution") %>%
  gConvexHull() %>%
  gBuffer()

spB.dist <- fd %>%
  filter(species == "sp.B") %>%
  dplyr::select(lon,lat) %>%
  as.matrix() %>%
  coords2Polygons(ID="distribution") %>%
  gConvexHull() %>%
  gBuffer() 

wrld2 = st_as_sf(map('world2', plot=F, fill=T))
ggplot() + 
  geom_sf(data=wrld2, fill='gray20',color="lightgrey",size=0.07) +
  geom_polygon(aes(x=long,y=lat,group=group),color="red",data=spA.dist,fill=NA) +
  geom_polygon(aes(x=long,y=lat,group=group),color="blue",data=spB.dist,fill=NA) + 
  coord_sf(xlim=c(100,300), ylim=c(-60,60))

Run Code Online (Sandbox Code Playgroud)

根据观察结果的凸包显示两个物种发生区域的地图。我意识到我在这里混合了不同的空间库，所以如果可能的话，最好在 sf 中完成所有操作。在我的真实数据中，我有两个以上的物种，我可以复制和粘贴我为每个物种获得的代码，但似乎应该可以简化这一点，因此多边形（以及随后的凸包）是按因子级别构建的自动地。更像这样的东西：

polys <- st_as_sf(fd) %>%
  group_by(species) %>%
  magically_make_polygons(lon,lat) %>%
  st_convex_hull() %>%
  st_buffer() …

Run Code Online (Sandbox Code Playgroud)

r spatial geos r-mapview r-sf

Lut*_*ett

lucky-day

3
推荐指数

1
解决办法

709
查看次数

使用拼凑将组合子图（拼凑？）注释为单个图

我试图弄清楚如何注释组合拼凑物，就好像它们是单独的图一样。

我有一个由三个组合图和另一个单个图组成的拼凑而成。最终的复合图是顶部的第一个拼凑物和底部的单独图。我可以毫无问题地获得我想要的布局，但是当我使用时plot_annotation，它会为每个图提供字母，而我想看到的是顶部图（三个子图拼凑而成）的 A 和底部图的 B （只是一个情节）

这是我目前正在做的事情：

library(ggplot2)
library(patchwork)

p1 <- ggplot(mtcars) + 
  geom_point(aes(mpg, disp)) + 
  ggtitle('Plot 1')

p2 <- ggplot(mtcars) + 
  geom_boxplot(aes(gear, disp, group = gear)) + 
  ggtitle('Plot 2')

p3 <- ggplot(mtcars) + 
  geom_point(aes(hp, wt, colour = mpg)) + 
  ggtitle('Plot 3')

p4 <- ggplot(mtcars) + 
  geom_bar(aes(gear)) + 
  facet_wrap(~cyl) + 
  ggtitle('Plot 4')

top_plot = (p1 + p2 + p3)
bottom_plot = p4
combined_plot <- (top_plot / bottom_plot) + plot_annotation(tag_levels="A")
combined_plot

Run Code Online (Sandbox Code Playgroud)

我希望看到的不是 AD 注释，而是顶部图（图 1-3）的 A 和底部图（图 4）的 …

graphics plot r ggplot2 patchwork

Lut*_*ett

lucky-day

3
推荐指数

1
解决办法

874
查看次数

在ggplotly线图中填充背景间隔

我正在尝试填充折线图的背景ggplot以指示白天/夜间时段。这个答案中的方法效果很好，但我想使用交互式地显示它们ggplotly，并且由于这个错误而成为一个问题，其中 ggplotly 不喜欢 -Inf 和 Inf 用作 y 限制geom_rect。有谁知道可以与 ggplotly 一起使用的解决方法吗？

为了便于阅读，我将其他答案中的示例代码粘贴到此处：

library(ggplot2)
dat <- data.frame(x = 1:100, y = cumsum(rnorm(100)))
#Breaks for background rectangles
rects <- data.frame(xstart = seq(0,80,20), xend = seq(20,100,20), col = letters[1:5])

p <- ggplot() + 
  geom_rect(data = rects, aes(xmin = xstart, xmax = xend, ymin = -Inf, ymax = Inf, fill = col), alpha = 0.4) +
  geom_line(data = dat, aes(x,y))
p

Run Code Online (Sandbox Code Playgroud)

产生这个可爱的人物：

但是，如果您这样做：

ggplotly(p) …

Run Code Online (Sandbox Code Playgroud)

background r fill ggplot2 plotly

Lut*_*ett

lucky-day

2
推荐指数

1
解决办法

375
查看次数

Pandas 相当于 R/dplyr group_by 汇总串联

我有一个操作需要将R 中的dplyr(and ) 转换为python 中的操作。在 R 中它非常简单，但我还无法在 pandas 中理解它。基本上，我需要按一（或多）列进行分组，然后将剩余的列连接在一起并用分隔符折叠它们。R 有一个很好的向量化函数，它完全可以满足我的需求。stringrpandasstr_c

这是 R 代码：

library(tidyverse)\ndf <- as_tibble(structure(list(file = c(1, 1, 1, 2, 2, 2), marker = c("coi", "12s", "16s", "coi", "12s", "16s"), start = c(1, 22, 99, 12, 212, 199), end = c(15, 35, 102, 150, 350, 1102)), row.names = c(NA, -6L), class = "data.frame") )\n\ndf %>%\n  group_by(file) %>%\n  summarise(markers = str_c(marker,"[",start,":",end,"]",collapse="|"))\n#> # A tibble: 2 \xc3\x97 2\n#>    file markers                               \n#>   <dbl> <chr>                                 \n#> …

Run Code Online (Sandbox Code Playgroud)

python r dataframe pandas dplyr

Lut*_*ett

lucky-day

2
推荐指数

1
解决办法

728
查看次数