高级 R 中修改列表的示例

lyh*_*817 6 memory for-loop r list dataframe

我似乎无法理解Advanced R 中的以下示例

x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
medians <- vapply(x, median, numeric(1))

y <- as.list(x)
cat(tracemem(y), "\n")
#> <0x7f80c5c3de20>

for (i in 1:5) {
  y[[i]] <- y[[i]] - medians[[i]]
}
#> tracemem[0x7f80c5c3de20 -> 0x7f80c48de210]: 
Run Code Online (Sandbox Code Playgroud)

我不明白为什么在这种情况下会创建一个副本,因为“如果一个对象绑定了一个名称,R 将就地修改它”,而引用的对象y确实只y绑定了一个名称。

Mat*_*lke 10

虽然关于 RStudio 参考文献的评论可能是真的,但看起来这本书已经过时了。

该页面源代码的最后一次提交是在 2019 年 6 月 25 日——该日期早于 R v4.0.0 的发布。

如果您检查R更改日志,您会发现 v4.0.0 中列出了以下更改:

现在使用引用计数代替 NAMED 机制来确定何时可以在基本 C 代码中安全地改变对象。这在某些情况下减少了复制的需要,并且应该允许将来进一步优化。它应该有助于使内部代码更易于维护。

R v3.6.3

实际上,如果您在 R v3.6.3(v4.0.0 之前的版本)下运行示例代码:

#> R version 3.6.3 (2020-02-29) -- "Holding the Windsock"
#> Copyright (C) 2020 The R Foundation for Statistical Computing
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> 
#> R is free software and comes with ABSOLUTELY NO WARRANTY.
#> You are welcome to redistribute it under certain conditions.
#> Type 'license()' or 'licence()' for distribution details.
#> 
#>   Natural language support but running in an English locale
#> 
#> R is a collaborative project with many contributors.
#> Type 'contributors()' for more information and
#> 'citation()' on how to cite R or R packages in publications.
#> 
#> Type 'demo()' for some demos, 'help()' for on-line help, or
#> 'help.start()' for an HTML browser interface to help.
#> Type 'q()' to quit R.

x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
medians <- vapply(x, median, numeric(1))

for (i in seq_along(medians)) {
  x[[i]] <- x[[i]] - medians[[i]]
}

cat(tracemem(x), "\n")
#> <000000002457F7D0> 

for (i in 1:5) {
  x[[i]] <- x[[i]] - medians[[i]]
}
#> tracemem[0x000000002457f7d0 -> 0x0000000024697c90]: 
#> tracemem[0x0000000024697c90 -> 0x0000000024697c20]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697c20 -> 0x0000000024697bb0]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697bb0 -> 0x0000000024697b40]: 
#> tracemem[0x0000000024697b40 -> 0x0000000024697ad0]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697ad0 -> 0x0000000024697a60]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697a60 -> 0x00000000246979f0]: 
#> tracemem[0x00000000246979f0 -> 0x0000000024697980]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697980 -> 0x0000000024697910]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697910 -> 0x00000000246978a0]: 
#> tracemem[0x00000000246978a0 -> 0x0000000024697830]: [[<-.data.frame [[<- 
#> tracemem[0x0000000024697830 -> 0x00000000246977c0]: [[<-.data.frame [[<- 
#> tracemem[0x00000000246977c0 -> 0x0000000024697750]: 
#> tracemem[0x0000000024697750 -> 0x00000000246976e0]: [[<-.data.frame [[<- 
#> tracemem[0x00000000246976e0 -> 0x0000000024697670]: [[<-.data.frame [[<- 

untracemem(x)

y <- as.list(x)
cat(tracemem(y), "\n")
#> <0000000024697600> 
 
for (i in 1:5) {
  y[[i]] <- y[[i]] - medians[[i]]
}
#> tracemem[0x0000000024697600 -> 0x00000000247ec708]:

untracemem(y)
Run Code Online (Sandbox Code Playgroud)

我们观察到为数据帧制作的 15 个副本和为列表制作的一个副本。

R v4.0.0

但是,如果我们在 R v4.0.0 下运行相同的示例代码:

#> R version 4.0.0 (2020-04-24) -- "Arbor Day"
#> Copyright (C) 2020 The R Foundation for Statistical Computing
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> 
#> R is free software and comes with ABSOLUTELY NO WARRANTY.
#> You are welcome to redistribute it under certain conditions.
#> Type 'license()' or 'licence()' for distribution details.
#> 
#>   Natural language support but running in an English locale
#> 
#> R is a collaborative project with many contributors.
#> Type 'contributors()' for more information and
#> 'citation()' on how to cite R or R packages in publications.
#> 
#> Type 'demo()' for some demos, 'help()' for on-line help, or
#> 'help.start()' for an HTML browser interface to help.
#> Type 'q()' to quit R.

x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
medians <- vapply(x, median, numeric(1))

for (i in seq_along(medians)) {
  x[[i]] <- x[[i]] - medians[[i]]
}

cat(tracemem(x), "\n")
#> <00000000236B0C50> 

for (i in 1:5) {
  x[[i]] <- x[[i]] - medians[[i]]
}
#> tracemem[0x00000000236b0c50 -> 0x00000000237a7a90]: 
#> tracemem[0x00000000237a7a90 -> 0x00000000237a7a20]: [[<-.data.frame [[<- 
#> tracemem[0x00000000237a7a20 -> 0x00000000237a79b0]: 
#> tracemem[0x00000000237a79b0 -> 0x00000000237a7940]: [[<-.data.frame [[<- 
#> tracemem[0x00000000237a7940 -> 0x00000000237a78d0]: 
#> tracemem[0x00000000237a78d0 -> 0x00000000237a7860]: [[<-.data.frame [[<- 
#> tracemem[0x00000000237a7860 -> 0x00000000237a77f0]: 
#> tracemem[0x00000000237a77f0 -> 0x00000000237a7780]: [[<-.data.frame [[<- 
#> tracemem[0x00000000237a7780 -> 0x00000000237a7710]: 
#> tracemem[0x00000000237a7710 -> 0x00000000237a76a0]: [[<-.data.frame [[<- 

untracemem(x)

y <- as.list(x)
cat(tracemem(y), "\n")
#> <00000000237A7630> 

for (i in 1:5) {
  y[[i]] <- y[[i]] - medians[[i]]
}

untracemem(y)
Run Code Online (Sandbox Code Playgroud)

我们观察了减少执行副本数量的变化的影响。数据帧的副本已从 15 个变为 10 个,并且不再为列表执行副本。

为了直接回答 OP 的问题,根据 NAMED 机制不必要地制作了副本。但是,R v4.0.0 中对引用计数的更改防止了不必要的复制,并且对象现在按预期就地修改。


tes*_*ter 0

正如评论中指出的,如果您在 R 而不是 RStudio 中运行代码,您将不会看到变化:

x <- data.frame(matrix(runif(5*1e2), ncol = 5))
medians <- vapply(x, median, numeric(1))
y <- as.list(x)
> cat(tracemem(y), "\n")
<0000000018A3BB80>
for (i in 1:5) {
  y[[i]] <- y[[i]] - medians[[i]]
}
> cat(tracemem(y), "\n")
<0000000018A3BB80>
Run Code Online (Sandbox Code Playgroud)