Ann*_*lli 6 memory performance r
我正在尝试整理一些关于 R 中循环增长数据结构的危险的材料for。我希望能够解释导致性能差异的幕后原因,尤其是方法中巨大的内存差异。
我对比了 3 种方法:
\nc()函数来增长结果向量。考虑以下表示:
\nlibrary(pryr)\n\nx <- runif(10, min = 1, max = 100)\n\n# Create function that appends to result vector through c\nfor_loop_c <- function(x, print = TRUE) {\n y <- NULL\n for (i in seq_along(x)) {\n y <- c(y, sqrt(x[i]))\n if (print) {\n print(c(address(y), refs(y)))\n }\n }\n y\n}\n# Create function that appends to result vector through assignment\nfor_loop_assign <- function(x, print = TRUE) {\n y <- NULL\n for (i in seq_along(x)) {\n y[i] <- sqrt(x[i])\n if (print) {\n print(c(address(y), refs(y)))\n }\n }\n y\n}\n\n# Create function that preallocates result vector\nfor_loop_preallocate <- function(x, print = TRUE) {\n y <- numeric(length(x))\n\n for (i in seq_along(x)) {\n y[i] <- sqrt(x[i])\n\n if (print) {\n print(c(address(y), refs(y)))\n }\n }\n y\n}\n\n# Run functions and check for copies by changes to address and refs\nfor_loop_c(x)\n#> [1] "0x11bfbdbf8" "1" \n#> [1] "0x11bf9b948" "1" \n#> [1] "0x11bf9f398" "1" \n#> [1] "0x11bf9f258" "1" \n#> [1] "0x11bf82938" "1" \n#> [1] "0x11bf82778" "1" \n#> [1] "0x11bf825b8" "1" \n#> [1] "0x11bf823f8" "1" \n#> [1] "0x11bf55768" "1" \n#> [1] "0x11bf55608" "1"\n#> [1] 3.976751 6.148983 9.373843 7.928771 5.321063 7.238960 5.707823 9.921684\n#> [9] 7.643938 3.764301\nfor_loop_assign(x)\n#> [1] "0x11c2ee4e8" "1" \n#> [1] "0x11c2bb608" "1" \n#> [1] "0x11c2b6c28" "1" \n#> [1] "0x11c2b6ae8" "1" \n#> [1] "0x11c224d48" "1" \n#> [1] "0x11c224b88" "1" \n#> [1] "0x11c2249c8" "1" \n#> [1] "0x11c224808" "1" \n#> [1] "0x11c2d3748" "1" \n#> [1] "0x11c2d35e8" "1"\n#> [1] 3.976751 6.148983 9.373843 7.928771 5.321063 7.238960 5.707823 9.921684\n#> [9] 7.643938 3.764301\nfor_loop_preallocate(x)\n#> [1] "0x11c5b8888" "1" \n#> [1] "0x11c5b8888" "1" \n#> [1] "0x11c5b8888" "1" \n#> [1] "0x11c5b8888" "1" \n#> [1] "0x11c5b8888" "1" \n#> [1] "0x11c5b8888" "1" \n#> [1] "0x11c5b8888" "1" \n#> [1] "0x11c5b8888" "1" \n#> [1] "0x11c5b8888" "1" \n#> [1] "0x11c5b8888" "1"\n#> [1] 3.976751 6.148983 9.373843 7.928771 5.321063 7.238960 5.707823 9.921684\n#> [9] 7.643938 3.764301\n\n# Create a bigger example x for benchmarking\nx <- runif(10000, min = 1, max = 100)\n\n# Benchmark\nbench::mark(\n for_loop_c(x, print = FALSE),\n for_loop_assign(x, print = FALSE),\n for_loop_preallocate(x, print = FALSE)\n)\n#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.\n#> # A tibble: 3 \xc3\x97 6\n#> expression min median `itr/sec` mem_alloc\n#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt>\n#> 1 for_loop_c(x, print = FALSE) 106ms 114.92ms 8.57 381.96MB\n#> 2 for_loop_assign(x, print = FALSE) 1.19ms 1.27ms 621. 1.66MB\n#> 3 for_loop_preallocate(x, print = FALSE) 381.71\xc2\xb5s 386.88\xc2\xb5s 2554. 78.17KB\n#> # \xe2\x80\xa6 with 1 more variable: `gc/sec` <dbl>\n\n\nlibrary(profmem)\ngc()\n#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)\n#> Ncells 824931 44.1 1409852 75.3 NA 1409852 75.3\n#> Vcells 1483448 11.4 8388608 64.0 32768 8388585 64.0\n\npm1 <- profmem({\n y <- NULL\n for (i in seq_along(x)) {\n y <- c(y, sqrt(x[i]))\n }\n\n})\n\n\npm2 <- profmem({\n y <- NULL\n for (i in seq_along(x)) {\n y[i] <- sqrt(x[i])\n }\n y\n\n})\n\n# Number of times memory allocation occurred\npm1$bytes |> length()\n#> [1] 10061\npm2$bytes |> length()\n#> [1] 174\nRun Code Online (Sandbox Code Playgroud)\n创建于 2023-02-02,使用reprex v2.0.2
\n\n\n会话信息\n\nsessioninfo::session_info()\n#> \xe2\x94\x80 Session info \xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\n#> setting value\n#> version R version 4.2.1 (2022-06-23)\n#> os macOS Monterey 12.3.1\n#> system aarch64, darwin20\n#> ui X11\n#> language (EN)\n#> collate en_US.UTF-8\n#> ctype en_US.UTF-8\n#> tz Europe/Athens\n#> date 2023-02-02\n#> pandoc 2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)\n#> \n#> \xe2\x94\x80 Packages \xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\n#> ! package * version date (UTC) lib source\n#> bench 1.1.2 2021-11-30 [1] CRAN (R 4.2.0)\n#> cli 3.6.0 2023-01-09 [1] CRAN (R 4.2.0)\n#> codetools 0.2-18 2020-11-04 [2] CRAN (R 4.2.1)\n#> P digest 0.6.29 2021-12-01 [?] CRAN (R 4.2.0)\n#> P evaluate 0.16 2022-08-09 [?] CRAN (R 4.2.1)\n#> fansi 1.0.3 2022-03-24 [2] CRAN (R 4.2.0)\n#> P fastmap 1.1.0 2021-01-25 [?] CRAN (R 4.2.0)\n#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0)\n#> P glue 1.6.2 2022-02-24 [?] CRAN (R 4.2.0)\n#> P highr 0.9 2021-04-16 [?] CRAN (R 4.2.1)\n#> P htmltools 0.5.3 2022-07-18 [?] CRAN (R 4.2.0)\n#> P knitr 1.40 2022-08-24 [?] CRAN (R 4.2.0)\n#> lifecycle 1.0.3 2022-10-07 [2] CRAN (R 4.2.0)\n#> P magrittr 2.0.3 2022-03-30 [?] CRAN (R 4.2.0)\n#> pillar 1.8.1 2022-08-19 [2] CRAN (R 4.2.0)\n#> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.2.0)\n#> P profmem * 0.6.0 2020-12-13 [?] CRAN (R 4.2.0)\n#> pryr * 0.1.6 2023-01-17 [1] CRAN (R 4.2.0)\n#> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.2.0)\n#> P R.cache 0.16.0 2022-07-21 [?] CRAN (R 4.2.0)\n#> P R.methodsS3 1.8.2 2022-06-13 [?] CRAN (R 4.2.0)\n#> P R.oo 1.25.0 2022-06-12 [?] CRAN (R 4.2.0)\n#> P R.utils 2.12.2 2022-11-11 [?] CRAN (R 4.2.0)\n#> Rcpp 1.0.9 2022-07-08 [2] CRAN (R 4.2.0)\n#> reprex 2.0.2 2022-08-17 [2] CRAN (R 4.2.0)\n#> rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.0)\n#> P rmarkdown 2.16 2022-08-24 [?] CRAN (R 4.2.0)\n#> rstudioapi 0.14 2022-08-22 [2] CRAN (R 4.2.0)\n#> sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.2.0)\n#> P stringi 1.7.8 2022-07-11 [?] CRAN (R 4.2.0)\n#> P stringr 1.4.1 2022-08-20 [?] CRAN (R 4.2.0)\n#> P styler 1.9.0 2023-01-15 [?] CRAN (R 4.2.0)\n#> tibble 3.1.8 2022-07-22 [2] CRAN (R 4.2.0)\n#> utf8 1.2.2 2021-07-24 [2] CRAN (R 4.2.0)\n#> P vctrs 0.5.1 2022-11-16 [?] CRAN (R 4.2.0)\n#> withr 2.5.0 2022-03-03 [2] CRAN (R 4.2.0)\n#> P xfun 0.33 2022-09-12 [?] CRAN (R 4.2.1)\n#> P yaml 2.3.5 2022-02-21 [?] CRAN (R 4.2.0)\n#> \n#> [1] /*/renv/library/R-4.2/aarch64-apple-darwin20\n#> [2] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library\n#> \n#> P \xe2\x94\x80\xe2\x94\x80 Loaded and on-disk path mismatch.\n#> \n#> \xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\nRun Code Online (Sandbox Code Playgroud)\n\n我理解为什么预分配是最有效的(不制作副本,每次迭代时处理相同的地址)。
\n我认为发生的事情是,在函数内制作了c()完整的副本,然后在分配回时制作了另一个副本,而在使用分配进行增长时,制作了一份副本(因此地址发生了变化),但仅在分配期间制作了?yy
我的问题是:
\n编辑
\n根据 @Kevin-Ushey 和 @alexis_laz 的反馈,我调整了示例以记录每次迭代时地址更改的累积数量:
\nlibrary(pryr)\nlibrary(ggplot2)\n\n# Create function that appends to result vector through c\n# Collect cumulative number of address changes per iteration\nfor_loop_c <- function(x, count_addr = TRUE) {\n y <- NULL\n y_addr <- address(y)\n cum_address_n <- 0\n cum_address_n_v <- numeric(length(x))\n\n for (i in seq_along(x)) {\n y <- c(y, sqrt(x[i]))\n if (address(y) != y_addr) {\n cum_address_n <- cum_address_n + 1\n y_addr <- address(y)\n }\n\n cum_address_n_v[i] <- cum_address_n\n }\n data.frame(i = seq_along(cum_address_n_v),\n cum_address_n = cum_address_n_v,\n mode = "c")\n}\n\n# Create function that appends to result vector through assignment.\n# Collect cumulative number of address changes per iteration\nfor_loop_assign <- function(x) {\n y <- NULL\n y_addr <- address(y)\n cum_address_n <- 0\n cum_address_n_v <- numeric(length(x))\n\n for (i in seq_along(x)) {\n\n y[i] <- sqrt(x[i])\n if (address(y) != y_addr) {\n cum_address_n <- cum_address_n + 1\n y_addr <- address(y)\n }\n cum_address_n_v[i] <- cum_address_n\n }\n data.frame(i = seq_along(cum_address_n_v),\n cum_address_n = cum_address_n_v,\n mode = "assign")\n}\n\n\n\n\nx <- runif(10000, min = 1, max = 100)\n\nrbind(for_loop_c(x), for_loop_assign(x)) |>\n ggplot(aes(x = i, y = cum_address_n, colour = mode)) +\n geom_line()\nRun Code Online (Sandbox Code Playgroud)\n
创建于 2023-02-03,使用reprex v2.0.2
\n\n\n会话信息\n\nsessioninfo::session_info()\n#> \xe2\x94\x80 Session info \xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\n#> setting value\n#> version R version 4.2.1 (2022-06-23)\n#> os macOS Monterey 12.3.1\n#> system aarch64, darwin20\n#> ui X11\n#> language (EN)\n#> collate en_US.UTF-8\n#> ctype en_US.UTF-8\n#> tz Europe/Athens\n#> date 2023-02-03\n#> pandoc 2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)\n#> \n#> \xe2\x94\x80 Packages \xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\n#> ! package * version date (UTC) lib source\n#> P assertthat 0.2.1 2019-03-21 [?] CRAN (R 4.2.0)\n#> cli 3.6.0 2023-01-09 [1] CRAN (R 4.2.0)\n#> codetools 0.2-18 2020-11-04 [2] CRAN (R 4.2.1)\n#> P colorspace 2.0-3 2022-02-21 [?] CRAN (R 4.2.1)\n#> curl 4.3.2 2021-06-23 [2] CRAN (R 4.2.0)\n#> DBI 1.1.3 2022-06-18 [1] CRAN (R 4.2.0)\n#> P digest 0.6.29 2021-12-01 [?] CRAN (R 4.2.0)\n#> dplyr 1.0.10 2022-09-01 [2] CRAN (R 4.2.0)\n#> P evaluate 0.16 2022-08-09 [?] CRAN (R 4.2.1)\n#> fansi 1.0.3 2022-03-24 [2] CRAN (R 4.2.0)\n#> P farver 2.1.1 2022-07-06 [?] CRAN (R 4.2.1)\n#> P fastmap 1.1.0 2021-01-25 [?] CRAN (R 4.2.0)\n#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0)\n#> generics 0.1.3 2022-07-05 [2] CRAN (R 4.2.0)\n#> P ggplot2 * 3.4.0 2022-11-04 [?] CRAN (R 4.2.0)\n#> P glue 1.6.2 2022-02-24 [?] CRAN (R 4.2.0)\n#> P gtable 0.3.1 2022-09-01 [?] CRAN (R 4.2.1)\n#> P highr 0.9 2021-04-16 [?] CRAN (R 4.2.1)\n#> P htmltools 0.5.3 2022-07-18 [?] CRAN (R 4.2.0)\n#> httr 1.4.4 2022-08-17 [2] CRAN (R 4.2.0)\n#> P knitr 1.40 2022-08-24 [?] CRAN (R 4.2.0)\n#> P labeling 0.4.2 2020-10-20 [?] CRAN (R 4.2.1)\n#> lifecycle 1.0.3 2022-10-07 [2] CRAN (R 4.2.0)\n#> P magrittr 2.0.3 2022-03-30 [?] CRAN (R 4.2.0)\n#> mime 0.12 2021-09-28 [2] CRAN (R 4.2.0)\n#> P munsell 0.5.0 2018-06-12 [?] CRAN (R 4.2.1)\n#> pillar 1.8.1 2022-08-19 [2] CRAN (R 4.2.0)\n#> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.2.0)\n#> pryr * 0.1.6 2023-01-17 [1] CRAN (R 4.2.0)\n#> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.2.0)\n#> P R.cache 0.16.0 2022-07-21 [?] CRAN (R 4.2.0)\n#> P R.methodsS3 1.8.2 2022-06-13 [?] CRAN (R 4.2.0)\n#> P R.oo 1.25.0 2022-06-12 [?] CRAN (R 4.2.0)\n#> P R.utils 2.12.2 2022-11-11 [?] CRAN (R 4.2.0)\n#> P R6 2.5.1 2021-08-19 [?] CRAN (R 4.2.0)\n#> Rcpp 1.0.9 2022-07-08 [2] CRAN (R 4.2.0)\n#> reprex 2.0.2 2022-08-17 [2] CRAN (R 4.2.0)\n#> rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.0)\n#> P rmarkdown 2.16 2022-08-24 [?] CRAN (R 4.2.0)\n#> rstudioapi 0.14 2022-08-22 [2] CRAN (R 4.2.0)\n#> P scales 1.2.1 2022-08-20 [?] CRAN (R 4.2.1)\n#> sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.2.0)\n#> P stringi 1.7.8 2022-07-11 [?] CRAN (R 4.2.0)\n#> P stringr 1.4.1 2022-08-20 [?] CRAN (R 4.2.0)\n#> P styler 1.9.0 2023-01-15 [?] CRAN (R 4.2.0)\n#> tibble 3.1.8 2022-07-22 [2] CRAN (R 4.2.0)\n#> P tidyselect 1.2.0 2022-10-10 [?] CRAN (R 4.2.0)\n#> utf8 1.2.2 2021-07-24 [2] CRAN (R 4.2.0)\n#> P vctrs 0.5.1 2022-11-16 [?] CRAN (R 4.2.0)\n#> withr 2.5.0 2022-03-03 [2] CRAN (R 4.2.0)\n#> P xfun 0.33 2022-09-12 [?] CRAN (R 4.2.1)\n#> xml2 1.3.3 2021-11-30 [2] CRAN (R 4.2.0)\n#> P yaml 2.3.5 2022-02-21 [?] CRAN (R 4.2.0)\n#> \n#> [1] /*/optimise-r/renv/library/R-4.2/aarch64-apple-darwin20\n#> [2] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library\n#> \n#> P \xe2\x94\x80\xe2\x94\x80 Loaded and on-disk path mismatch.\n#> \n#> \xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\nRun Code Online (Sandbox Code Playgroud)\n\n根据答案和评论中的反馈,我的解释是:
\ny增加到不再通过 R 的小向量池管理,而是通过向操作系统请求额外内存来管理时,地址变化会逐渐减少。我认为这意味着,在处理较大的向量时,R 可以在额外内存请求之间使用分配时就地修改对象使用分配时就地修改对象,并且由于每次迭代中的修改非常小,因此该算法可以运行相当多的迭代无需请求额外的内存。c()会在每次迭代时触发地址更改。然而,我仍然不清楚这是否是因为在内部c()进行了修改y并因此触发了副本,或者是否与分配一个完整的新y元素有关y,而不是分配一个附加元素?R(自版本 3.4.0 起)将为原子向量分配一些额外的内存,因此,如果仍有一些备用容量可用,则通过子分配“增长”此类向量可能不需要重新分配。R 内部手册对此进行了一些讨论;请参阅对向量“truelength”的引用:
https://cran.r-project.org/doc/manuals/r-release/R-ints.html#The-_0027data_0027
https://cran.r-project.org/doc/manuals/r-release/R-ints.html#FOOT3
因此,在过去,普遍的看法是“总是预先分配向量”和“避免 for 循环”,但现在如果向量的最终容量未知,通过子分配来增长向量可能是一个合理的解决方案。
这与函数的字节编译一起意味着一些关于避免 for 循环的常识不再像以前那样正确。(但是,性能最佳的 R 代码通常仍然是函数式风格,或者需要仔细预分配内存/向量并避免频繁分配。)